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DIRECTED EVOLUTION OF NOVEL BINDING PROTEINS 

This application is a continuation-in-part of Ladner, 
Guterman, Roberts, and Markland, Ser. No, 07/487,063, 
filed March 2, 1990, now pending, which is a continuation- 
5 in-part of Ladner and Guterman, Ser. No. 07/240,160, filed 
Sept. 2, 1988, now pending. Ser. No. 07/487,063 claimed 
priority under 35 U.S.C. 119 from PCT Application No. 
PCT/US89/03731, filed Sept. 1, 1989. All of the foregoing 
applications are hereby incorporated by reference. 

10 Cross-Reference to Related Applications 

The following related _and commonly-owned applications 
are also incorporated by reference: 

Robert Charles Ladner, Sonia Kosow Guterman, Rachael 
Baribault Kent, and Arthur Charles Ley are named as joint 
15 inventors on U.S. S.N. 07/293,980, filed January 8, 1989, 
and entitled GENERATION AND SELECTION OF NOVEL DNA-BINDING 
PROTEINS AND POLYPEPTIDES. This application has been 
assigned to Protein Engineering Corporation. 

Robert Charles Ladner, Sonia Kosow Guterman, and 
20 Bruce Lindsay Roberts are named as a joint inventors on a 
U.S. S.N. 07/470,651 filed 26 January 1990, entitled 
"PRODUCTION OF NOVEL SEQUENCE-SPECIFIC DNA-ALTERING 
ENZYMES" , likewise assigned to Protein Engineering Corp. 

Ladner, Guterman, Kent, Ley, and Markland, Ser. No. 
25 07/558,011 is also assigned to Protein Engineering 
Corporation . 

BACKGROUND OF THE INVENTION 
Field of the Invention - 

3 0 This invention relates to development of novel 

binding proteins (including mini-proteins) by an iterative 
process of mutagenesis, expression, chromatographic 
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selection, and amplification. In this process, a gene 
encoding a potential binding domain, said gene being 
obtained by random mutagenesis of a limited number of 
predetermined codons, is fused to a genetic element which 
5 causes the resulting chimeric expression product to be 
displayed on the outer surface of a virus (especially a 
filamentous phage) or a cell- Chromatographic selection 
is then used to identify viruses or cells whose genome 
includes such a fused gene which coded for the protein 
10 which bound to the chromatographic target. 

Information Disclosure Statement - 

A. Protein Structure 

The amino acid sequence of a protein determines its 
three-dimensional (3D) structure, which in turn determines 

15 protein function (EPST63, ANFI73) . Shortle (SHOR85) , 
Sauer and colleagues (PAKU86, REID88a) , and Caruthers and 
colleagues (EISE85) have shown that some residues on the 
polypeptide chain are more important than others in 
determining the 3D structure of a protein. The 3D 

20 structure is essentially unaffected by the identity of the 
amino acids at some loci; at other loci only one or a few 
types of amino -acid is allowed. - In most -cases, loci where 
wide variety is allowed have the amino acid side group 
directed toward the solvent. Loci where limited variety 

25 is allowed frequently have the side group directed toward 
other parts of the protein. Thus substitutions of amino 
acids that are exposed to solvent are less likely to 
affect the 3D structure than are substitutions at internal 
loci. (See also SCHU79, pl69-171 and CREI84, p239-245, 

30 314-315) . 

The secondary structure (helices, sheets, turns, 
loops) of a protein is determined mostly by local se- 
quence. Certain amino acids have a propensity to appear 
in certain "secondary structures," they will be found from 
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time to time in other structures, and studies of pentapep- 
tide sequences found in different proteins have shown that 
their conformation varies considerably from one occurrence 
to the next (KABS84, ARG087) . As a result, a priori 
5 design of proteins to have a particular 3D structure is 
difficult. 

Several researchers have designed and synthesized 
proteins de novo (MOSE83, MOSE87, ERIC86) . These designed 
proteins are small and most have been synthesized in vitro 
10 as polypeptides rather than genetically. Hecht et aL 
(HECH90) have produced a designed protein genetically. 
Moser, et al. state that -design of biologically active 
proteins is currently impossible. 

B. Protein Binding Activity 

15 Many proteins bind non-covalently but very tightly 

and specifically to some other characteristic molecules 
(SCHU79, CREI84). In each case the binding results from 
complementarity of the surfaces that come into contact: 
bumps fit into holes, unlike charges come together, 

20 dipoles align, and hydrophobic atoms contact other 
hydrophobic atoms. Although bulk water is excluded, 
individual water molecules are frequently „ found, filling 
space in intermolecular interfaces; these waters usually 
form hydrogen bonds to one or more atoms of the protein or 

25 to other bound water. Thus proteins found in nature have 
not attained, nor do they require, perfect complementarity 
to bind tightly and specifically to their substrates. 
Only in rare cases is there essentially perfect complemen- 
tarity; then the binding is extremely tight (as for 

30 example, avidin binding to biotin) . 

C. Protein Engineering 

"Protein engineering" is the art of manipulating the 
sequence of a protein in order to alter its binding 



characteristics. The factors affecting protein binding 
are known, (CH0T75, CHOT76, SCHU79, p98-107, and CREI84, 
Ch8) , but designing new complementary surfaces has proved 
difficult. Although some rules have been developed for 
5 substituting side groups (SUTC87b) , the side groups of 
proteins are floppy and it is. difficult to predict what 
conformation a new side group will take. Further, the 
forces that bind proteins to other molecules are all 
relatively weak and it is difficult to predict the effects 
10 of these forces. 

Recently, Quiocho and collaborators (QUI087) eluci- 
dated the structures of several periplasmicr binding 
proteins from Gram-negative bacteria. They found that the 
proteins, despite having low sequence homology and differ- 

15 ences in structural detail, have certain important 
structural similarities* Based on their investigations of 
these binding proteins, Quiocho et al. suggest it is 
unlikely that, using current protein engineering methods, 
proteins can be constructed with binding properties 

20 superior to those of proteins that occur naturally. 

Nonetheless, there have been some isolated successes. 
Wilkinson et al. (W1LK84) reported that a mutant of the 
tyrosyl tRNA synthetase of Bacillus stearothermoohilus 
with the mutation Thr 51 ->Pro exhibits a 100-fold increase 

25 in affinity for ATP. Tan and Kaiser (TANK77) and Tsche- 
sche et al. (TSCH87) showed that changing a single amino 
acid in mini-protein greatly reduces its binding to 
trypsin, but that some of the mutants retained the 
parental characteristic of binding to an inhibiting 

30 chymotrypsin, while others exhibited new binding to 
elastase. Caruthers and others (EISE85) have shown that 
changes of single amino acids on the surface of the lambda 
Cro repressor greatly reduce its affinity for the natural 
operator 0 R 3, but greatly increase the binding of the 



mutant protein to a mutant operator. Changing three 
residues in subtilisin from Bacillus amyloliquefaciens to 
be the same as the corresponding residues in subtilisin 
from Bi. licheniformis produced a protease having nearly 
5 the same activity as the latter subtilisin, even though 82 
amino acid sequence differences remained (WELL87a) . 
Insertion of DNA encoding 18 amino acids (corresponding to 
Pro-Glu-Dynorphin-Gly) into the coli phoA gene so that 
the additional amino acids appeared within a loop of the 
10 alkaline phosphatase protein resulted in a chimeric 
protein having both phoA and dynorphin activity (FREI90) . 
Thus, changing the surface of a binding protein may alter 
its specificity without abolishing binding activity. 

D. Techniques Of Mutagenesis 

15 Early techniques of mutating proteins involved manip- 

ulations at the amino acid sequence level. In the semi- 
synthetic method (TSCH87) , the protein was cleaved into 
two fragments, a residue removed from the new end of one 
fragment, the substitute residue added on in its place, 

20 and the modified fragment joined with the other, original 
fragment. Alternatively, the mutant protein could be 

synthesized in its entirety (TANK77) . 

Erickson et al . suggested that mixed amino acid 
reagents could be used to produce a family of sequence- 

25 related proteins which could then be screened by affinity 
chromatography (ERIC86) . They envision successive rounds 
of mixed synthesis of variant proteins and purification by 
specific binding. They do not discuss how residues should 
be chosen for variation. Because proteins cannot be 

3 0 amplified, the researchers must sequence the recovered 
protein to learn which substitutions improve binding. The 
researchers must limit the level of diversity so that each 
variety of protein will be present in sufficient quantity 
for the isolated fraction to be sequenced. 
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With the development of recombinant DNA techniques, 
it became possible to obtain a mutant protein by mutating 
the gene encoding the native protein and then expressing 
the mutated gene. Several mutagenesis strategies are 
5 known. One, "protein surgery" (DILL87) , involves the 
introduction of one or more predetermined mutations within 
the gene of choice. A single polypeptide of completely 
predetermined sequence is expressed, and its binding 
characteristics are evaluated. 

10 At the other extreme is random mutagenesis by means 

of relatively nonspecific mutagens such as radiation and 
various chemical agents. See Ho et al . (HOCJ85) and 
Lehtovaara, E.P. Appln. 285,123. 

It is possible to randomly vary predetermined 

15 nucleotides using a mixture of bases in the appropriate 
cycles of a nucleic acid synthesis procedure* The 
proportion of bases in the mixture, for each position of a 
codon, will determine the frequency at which each amino 
acid will occur in the polypeptides expressed from the 

20 degenerate DNA population. Oliphant et al . (0LIP86) and 
Oliphant and Struhl (OLIP87) have demonstrated ligation 
and cloning of highly degenerate oligonucleotides, which 
were used in the mutation of promoters. They suggested 
that similar methods could be used in the variation of 

25 protein coding regions. They do not say how one should: 
a) choose protein residues to vary, or b) select or screen 
mutants with desirable properties. Reidhaar-Olson and 
Sauer (REID88a) have used synthetic degenerate oligo-nts 
to vary simultaneously two or three residues through all 

30 twenty amino acids. See also Vershon et al. (VERS86a; 
VERS86b) . Reidhaar-Olson and Sauer do not discuss the 
limits on how many residues could be varied at once nor do 
they mention the problem of unequal abundance of DNA 
encoding different amino acids. They looked for proteins 



that either had wild-type dimerization or that did not 
dimerize. They did not seek proteins having novel binding 
properties and did not find any* This approach is 
likewise limited by the number of colonies that can be 
5 examined (ROBE86) . 

To the extent that this prior work assumes that it is 
desirable to adjust the level of mutation so that there is 
one mutation per protein, it should be noted that many 
desirable protein alterations require multiple amino acid 
10 substitutions and thus are not accessible through single 
base changes or even through all possible amino acid 
substitutions at any one residue, 

D. Affinity Chromatography of Cells 

Ferenci and coloborators have published a series of 
15 papers on the chromatographic isolation of mutants of the 
maltose-transport protein LamB of E^. coli (FERE82a, 
FERE82b / FERE83, FERE84, CLUN84, HEIN87 and papers cited 
therein) . The mutants were either spontaneous or induced 
with nonspecific chemical mutagens. Levels of mutagenesis 
2 0 were picked to provide single point mutations or single 
insertions of two residues. No multiple mutations were 
sought or found. 

While variation was seen in the degree of affinity 
for the conventional LamB substrates maltose and starch, 

25 there was no selection for affinity to a target molecule 
not bound at all by native LamB, and no multiple mutations 
were sought or found. FERE84 speculated that the affinity 
chromatographic selection technique could be adapted to 
development of similar mutants of other "important 

30 bacterial surface-located enzymes", and to selecting for 
mutations which result in the relocation of an intracel- 
lular bacterial protein to the cell surface. Ferenci 1 s 
mutant surface proteins would not, however, have been 
chimeras of a bacterial surface protein and an exogenous 
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or heterologous binding domain. 

Ferenci also taught that there was no need to clone 
the structural gene, or to know the protein structure, 
active site, or sequence. The method of the present 
5 invention, however, specifically utilizes a cloned 
structural gene. It is not possible to construct and 
express a chimeric, outer surface-directed potential 
binding protein-encoding gene without cloning. 

Ferenci did not limit the mutations to particular 
10 loci or particular substitutions. In the present inven- 
tion, knowledge of the protein structure, active site 
and/or sequence is used as appropriate to predict which 
residues are most likely to affect binding activity 
without unduly destabilizing the protein, and the mutagen- 
15 esis is focused upon those sites. Ferenci does not 
suggest that surface residues should be preferentially 
varied. In consequence, Ferenci f s selection system is 
much less efficient than that disclosed herein. 

E. Bacterial and Viral Expression of Chimeric Surface 

20 Proteins 

A number of researchers have directed unmutated 
foreign antigenic epitopes to the surface of bacteria or 
phage, fused to a native bacterial or phage surface 
protein, and demonstrated that the epitopes were recog- 

25 nized by antibodies* Thus, Charbit, et al. (CHAR8 6) 
genetically inserted the C3 epitope of the VP1 coat 
protein of poliovirus into the LamB outer membrane protein 
of E. coli , and determined immunologically that the C3 
epitope was exposed on the bacterial cell surface. 

3 0 Charbit, et al. (CHAR87) likewise produced chimeras of 
LamB and the A (or B) epitopes of the preS2 region of 
hepatitis B virus. 

A chimeric LacZ/OmpB protein has been expressed in 
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coli and is, depending on the fusion, directed to either 
the outer membrane or the periplasm (SILH77) . A chimeric 
LacZ/OmpA surface protein has also been expressed and 
displayed on the surface of E_j_ coli cells (Weinstock et 
5 al. , WEIN83). Others have expressed and displayed on the 
surface of a cell chimeras of other bacterial surface 
proteins, such as coli type 1 fimbriae (Hedegaard and 
Klemm (HEDE89) ) and Bactericides nodusus type 1 fimbriae 
(Jennings et al. , JENN89) . In none of the recited cases 
10 was the inserted genetic material mutagenized. 

Dulbecco (DULB86) suggests a procedure for incor- 
porating a foreign antigenic epitope into a viral surface 
protein so that the expressed chimeric protein is dis- 
played on the surface of the virus in a manner such that 

15 the foreign epitope is accessible to antibody. In 1985 
Smith (SMIT85) reported inserting a nonfunctional segment 
of the EcoRI endonuclease gene into gene III of bacterio- 
phage fl, "in phase". The gene III protein is a minor 
coat protein necessary for infectivity. Smith demons - 

20 trated that the recombinant phage were adsorbed by 
immobilized antibody raised against the Eco RI endonucle- 
ase, and could be eluted with acid. De la Cruz et al. 
(DEIA88) have expressed a fragment of the repeat region of 
the circumsporozoite protein from Plasmodium falciparum on 

25 the surface of M13 as an insert in the gene III protein. 
They showed that the recombinant phage were both antigenic 
and immunogenic in rabbits, and that such recombinant 
phage could be used for B epitope mapping. The resear- 
chers suggest that similar recombinant phage could be used 

30 for T epitope mapping and for vaccine development. 

None of these researchers suggested mutagenesis of 
the inserted material, nor is the inserted material a 
complete binding domain conferring on the chimeric protein 
the ability to bind specifically to a receptor other than 
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the antigen combining site of an antibody. 

McCafferty et al. (MCCA90) expressed a fusion of an 
Fv fragment of an antibody to the N- terminal of the pi 1 1 
protein* The Fv fragment was not mutated. 

5 F. Epitope Libraries on Fusion Phage 

Parmley and Smith (PARM88) suggested that an epitope 
library that exhibits all possible "hexapeptides could be 
constructed and used to isolate epitopes that bind to 
antibodies. In discussing the epitope library, the 
10 authors did not suggest that it was desirable to balance 
the representation of different amino acids. Nor did they 
teach that the insert should encode a complete domain of 
the exogenous protein. Epitopes are considered to be 
unstructured peptides as opposed to structured proteins. 

15 After the filing of the parent application whose 

benefit is claimed herein under 35 U.S.C. 120, certain 
groups reported the construction of "epitope libraries." 
Scott and Smith (SCOT90) and Cwirla et al. (CWIR90) 
prepared "epitope libraries" in which potential hexapep- 

20 tide epitopes for a target antibody were randomly mutated 
by fusing degenerate oligonucleotides, encoding the 
epitopes, with gene III of fd phage, and expressing the 
fused gene in phage-inf ected cells. The cells manufac- 
tured fusion phage which displayed the epitopes on their 

25 surface; the phage which bound to immobilized antibody 
were eluted with acid and studied. In both cases, the 
fused gene featured a segment encoding a spacer region to 
separate the variable region from the wild type pill 
sequence so that the varied amino acids would not be 

30 constrained by the nearby pill sequence. Devlin et al, 
(DEVL90) similarly screened, using M13 phage, for random 
15 residue epitopes recognized by streptavidin. Again, a 
spacer was used to move the random peptides away from the 
rest of the chimeric phage protein. These references 
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therefore taught away from constraining the conformational 
repertoire of the mutated. residues. 

Another problem with the Scott and Smith, Cwirla et 
al. . and ^Devlin et ah, libraries was that they provided a 
5 highly biased sampling of the possible amino acids at each 
position. Their primary concern in designing the degener- 
ate oligonucleotide encoding their variable region was to 
ensure that all twenty amino acids were encodible at each 
position; a secondary consideration was minimizing the 

10 frequency of occurrence of stop signals. Consequently, 
Scott and Smith and Cwirla et al. employed NNK (N=equal 
mixture of G, A, T, C; K=equal mixture of G and T) while 
Devlin et al^ used NNS (S=equal mixture of G and C) . 
There was no attempt to minimize the frequency ratio of 

15 most favored-to-least favored amino acid, or to equalize 
the rate of occurrence of acidic and basic amino acids. 

Devlin et al . characterized several affinity- 
selected streptavidin-binding peptides, but did not 
measure the affinity constants for these peptides. Cwirla 

20 et al. did determine the affinity constant for his 
peptides, but were disappointed to find that his best 
hexapeptides had affinities (350-300nM) , "orders of 
magnitude" weaker than that of the native Met-enkephalin 
epitope (7nM) recognized by the target antibody. Cwirla 

25 et al. speculated that phage bearing peptides with higher 
affinities remained bound under acidic elution, possibly 
because of multivalent interactions between phage (carry- 
ing about 4 copies of pill) and the divalent target IgG. 
Scott and Smith were able to find peptides whose affinity 

30 for the target antibody (A2) was comparable to that of the 
reference myohemerythrin epitope (50nM) . However, Scott 
and Smith likewise expressed concern that some high- 
affinity peptides were lost, possibly through irreversible 
binding of fusion phage to target. 
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G. Non-Commonly Owned Patents and Applications Naming 
Robert Ladner as an Inventor 

Ladner, US Patent No. 4 , 704, 692 , "Computer Based 
System and Method for Determining and Displaying Possible 
5 Chemical Structures for Converting Double- or Multiple- 
Chain Polypeptides to Single-Chain Polypeptides" describes 
a design method for converting proteins composed of two or 
more chains into proteins of fewer polypeptide chains, but 
with essentially the same 3D structure. There is no 
10 mention of variegated DNA and no genetic selection. 
Ladner and Bird, WO88/01649 (Publ. March 10, 1988) 
disclose the specific application of computerized design 
of linker peptides to the preparation of single chain 
antibodies. 

15 Ladner, Glick, and Bird, WO88/0663 0 (publ. 7 Sept. 

1988 and having priority from US application 07/021,046, 
assigned to Genex Corp.) (LGB) speculate that diverse 
single chain antibody domains (SCAD) may be screened for 
binding to a particular antigen by varying the DNA 

20 encoding the combining determining regions of a single 
chain antibody, subcloning the SCAD gene into the gpV gene 
of phage lambda so that a SCAD/gpV chimera is displayed on 
the outer surface of phage lambda, and selecting phage 
which bind to the antigen through affinity chromatography. 

25 The only antigen mentioned is bovine growth hormone. No 
other binding molecules, targets, carrier organisms, or 
outer surface proteins are discussed. Nor is there any 
mention of the method or degree of mutagenesis. Further- 
more, there is no teaching as to the exact structure of 

3 0 the fusion nor of how to identify a successful fusion or 
how to proceed if the SCAD is not displayed. 

Ladner and Bird, WO88/06601 (publ. 7 September 1988) 
suggest that single chain "pseudodimeric" repressors 
(DNA-binding proteins) may be prepared by mutating a 
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putative linker peptide followed by in vivo selection that 
mutation and selection may be used to create a dictionary 
of recognition elements for use in the design of asym- 
metric repressors. The repressors are not displayed on 
5 the outer surface of an organism. 

Methods of identifying residues in protein which can 
be replaced with a cysteine in order to promote the 
formation of a protein-stabilizing disulfide bond are 
given in Pantoliano and Ladner, U.S. Patent No. 4,903,773 
10 (PAKT90) , Pantoliano and Ladner (PANT87) , Pabo and 
Suchenek (PAB086) , MATS89, and SAUE86 . 

No admission is made that any cited reference is 
prior art or pertinent prior art, and the dates given are 
15 those appearing on the reference and may not be identical 
to the actual publication date. All references cited in 
this specification are hereby incorporated by reference. 
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SUMMARY OF THE INVENTION 

The present invention is intended to overcome the 
deficiencies discussed above. It relates to the construc- 
tion, expression, and selection of mutated, genes that 
specify novel proteins with desirable binding properties, 
as well as these proteins themselves. The substances 
bound by these proteins, hereinafter referred to as 
"targets", may be, but need not be, proteins. Targets may 
include other biological or synthetic macromolecules as 
well as other organic and inorganic substances . 

The fundamental principle of the invention is one of 
forced evolution . In nature, evolution results from the 
combination of genetic variation, selection for advan- 
tageous traits, and reproduction of the selected indi- 
viduals, thereby enriching the population for the trait. 
The present invention achieves genetic variation through 
controlled random mutagenesis ( " variegation ") of DKA, 
yielding a mixture of DNA molecules encoding different but 
related potential binding proteins. It selects for 
mutated genes that specify novel proteins with desirable 
binding properties by 1) arranging that the product of 
each mutated" gene be displayed on the outer surface of a 
replicable genetic package (GP) (a cell, spore or virus) 
that contains the gene, and 2) using affinity selection — 
selection for binding to the target material — to enrich 
the population of packages for those packages containing 
genes specifying proteins with improved binding to that 
target material. Finally, enrichment is achieved by 
allowing only the genetic packages which, by virtue of 
the displayed protein, bound to the target, to reproduce. 
The evolution is "forced" in that selection is for the 
target material provided. 

The display strategy is first perfected by modifying 
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a genetic package to display a stable, structured domain 
(the " initial potential binding domain ", IPBD) for which 
an affinity molecule (which may be an antibody) is 
obtainable. The success of the modifications is readily 
5 measured by, e.g. , determining whether the modified 
genetic package binds to the affinity molecule. 

The IPBD is chosen with a view to its tolerance for 
extensive mutagenesis. Once it is known that the IPBD can 
be displayed on a surface of a package and subjected to 

10 affinity selection, the gene encoding the IPBD is sub- 
jected to a special pattern of multiple mutagenesis, here 
-termed " variegation " , which after appropriate cloning and 
amplification steps leads to the production of a popula- 
tion of genetic packages each of which displays a single 

15 potential binding domain (a mutant of the IPBD) , but which 
collectively display a multitude of different though 
structurally related potential binding domains (PBDs) . 
Each genetic package carries the version of the pbd gene 
that encodes the PBD displayed on the surface of that par- 

20 ticular package. Affinity selection is then used to 
identify the genetic packages bearing the PBDs with the 
desired binding characteristics, and these genetic 
- packages may then be amplified. After one or more cycles 
of enrichment by affinity selection and amplification, the 

25 DNA encoding the successful binding domains (SBDs) may 
then be recovered from selected packages. 

If need be, the DNA from the SBD-bearing packages may 
then be further "variegated", using an SBD of the last 
round of variegation as the "parental potential binding 

30 domain" (PPBD) to the next generation of PBDs, and the 
process continued until the worker in the art is satisfied 
with the result. At that point, the SBD may be produced 
by any conventional means, including chemical synthesis. 

When the number of different amino acid sequences 
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obtainable by mutation of the domain is large when 
compared to the number of. different domains which are 
displayable in detectable amounts, the efficiency of the 
forced evolution is greatly enhanced by careful choice of 
which residues are to be varied. First, residues of a 
known protein which are likely to affect its binding 
activity ( e.g. , surface residues) and not likely to unduly 
degrade its stability are identified. Then all or some of 
the codons encoding these residues are varied - simul- 
taneously to produce a variegated population of DNA. The 
variegated population of DNA is used to express a variety 
of potential binding domains, whose ability to bind the 
target of interest may then be evaluated. 

The method of the present invention is thus further 
distinguished from other methods in the nature of the 
highly variegated population that is produced and from 
which novel, binding proteins are selected. We force the 
displayed potential binding domain to sample the nearby 
"sequence space" of related amino-acid sequences in an 
efficient, organized manner. Four goals guide the 
various variegation plans used herein, preferably: 1) a 
very large number ( e.g. 10 7 ) of variants is available, 2) 
-a very^ high percentage of -the possible variants actually 
appears in detectable amounts, 3) the frequency of appear- 
ance of the desired variants is relatively uniform, and 4) 
variation occurs only at a limited number of amino-acid 
residues, most preferably at residues having side groups 
directed toward a common region on the surface of the 
potential binding domain. 

This is to be distinguished from the simple use of 
indiscriminate mutagenic agents „ such as radiation and 
hydroxyl amine to modify a gene, where there is no (or very 
oblique) control over the site of mutation. Many of the 
mutations will affect residues that are not a part of the 
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binding domain ♦ Moreover, since at a reasonable level of 
mutagenesis, any modified codon is likely to be charac- 
terized by a single base change, only a limited and 
biased range of possibilities will be explored. Equally 
remote is the use of site-specific mutagenesis techniques 
employing mutagenic oligonucleotides of nonrandomized 
sequence, since these techniques do not lend themselves to 
the production and testing of a large number of variants. 
While focused random mutagenesis techniques are known, the 
importance of controlling the distribution of variation 
has been largely overlooked. 

In order "to obtain the display of a multitude of 
different though related potential binding domains, 
applicants generate a heterogeneous population of replic- 
able genetic packages each of which comprises a hybrid 
gene including a first DNA sequence which encodes a 
potential binding domain for the target of interest and a 
second DNA sequence which encodes a display means, such as 
an outer surface protein native to the genetic package but 
not natively associated with the potential binding domain 
(or the parental binding domain to which it is related) 
which causes the genetic package to display the corres- 
ponding chimeric protein (or a processed form thereof) on 
its outer surface . 

It should be recognized that by expressing a hybrid 
protein which comprises an outer surface transport signal 
not natively associated with the binding domain, the 
utility of the present invention is greatly extended. The 
binding domain need not be that of a surface protein of 
the genetic package (or, in the case of a viral package, 
of its host cell) , since the provided outer surface 
transport signal is responsible for achieving the desired 
display. Thus, it is possible to display on the surface 
of a phage, bacterial cell or bacterial spore a binding 
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domain related to the binding domain of a normally 
cytoplasmic binding protein, or the binding domain of 
eukaryotic protein which is not found on the surface of 
prokaryotic cells or viruses. 

5 Another important aspect of the invention is that 

each potential binding domain remains physically asso- 
ciated with the particular DNA molecule which encodes it. 
Thus, once successful binding domains are identified, one 
may readily recover the gene and either express additional 
10 quantities of the novel binding protein or further mutate 
the" gene. " " The form that this association takes is a 
"replicable genetic package", a virus, cell or spore which 
P% replicates and expresses the binding domain-encoding gene, 

yy and transports the binding domain to its outer surface. 

15 It is also possible chemically or enzymatically to 

yj modify the PBDs before selection. The selection then 

SI identifies the best modified amino acid sequence. For 

example, we could treat the variegated population of 
genetic packages that display a variegated population of 
0 20 binding domains with a protein tyrosine kinase and then 

select for binding the target. Any tyrosines on the BD 
Jji surface will be phosphorylated and this could affect the 

0 binding properties. Other chemical or enzymatic modifica- 

p * tions are possible. 

25 By virtue of the present invention, proteins are 

obtained which can bind specifically to targets other than 
the antigen-combining sites of antibodies. A protein is 
not to be considered a "binding protein" merely because it 
can be bound by an antibody (see definition of "binding 

30 protein" which follows) . While almost any amino acid 
sequence of more than about 6-8 amino acids is likely, 
when linked to an immunogenic carrier, to elicit an immune 
response, any given random polypeptide is unlikely to 
satisfy the stringent definition of "binding protein" with 



respect to minimum affinity and specificity for its 
substrate. It is only by testing numerous random polypep- 
tides simultaneously (and, in the usual case, controlling 
the extent and character of the sequence variation, i. e. , 
limiting it to residues of a potential binding domain 
having a stable structure, the residues being chosen as 
more likely to affect binding than stability) that this 
obstacle is overcome* 

In one embodiment, the invention relates to: 

a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 
acid construct coding for an outer-surface-displayed 
potential binding protein other than an antibody, 
comprising (i) a structural signal directing the 
display of the protein (or a processed form thereof) 
on the outer surface of the package and (ii) a 
potential binding domain for binding said target, 
where the population collectively displays a multi- 
tude of different potential binding domains having a 
substantially predetermined range of variation in 
sequence, 

b) causing the expression of said protein and the 
display of said protein on the outer surface of such 
packages , 

c) contacting the packages with target material, other 
than an antibody with an exposed antigen-combining 
site, so that the potential binding domains of the 
proteins and the target material may interact, and 
separating packages bearing a potential binding 
domain that succeeds in binding the target material 
from packages that do not so bind, 

d) recovering and replicating at least one package 
bearing a successful binding domain, 

e) determining the amino acid sequence of the successful 
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binding domain of a genetic package which bound to 
the target material,* 

f) preparing a new variegated population of replicable 
genetic packages according to step (a) , the parental 

5 potential binding domain for the potential binding 

domains of said new packages being a successful 
binding domain whose sequence was determined in step 
(e) , and repeating steps (b)-(e) with said new 
population, and, when a package bearing a binding 
10 domain of desired binding characteristics is ob- 

tained, 

g) abstracting the DNA encoding the desired binding 
domain from the genetic package and placing it into a 
suitable expression system. (The binding domain may 

15 then be expressed as a unitary protein, or as a 

domain of a larger protein) • 

The invention is not, however, limited to proteins 
with a single BD since the method may be applied to any or 
all of the BDs of the protein, sequentially or simultane- 

20 ously. The invention is not, however, limited to biolog- 
ical synthesis of the binding domains; peptides having an 
amino-acid sequence determined by the isolated DNA can be 
chemically synthesized. 

The invention further relates to a variegated 

25 population of genetic packages « Said population may be 
used by one user to select for binding to a first target, 
by a second user to select for binding to a second target, 
and so on, as the present invention does not require that 
the initial potential binding domain actually bind to the 

30 target of interest, and the variegation is at residues 
likely to affect binding. The invention also relates to 
the variegated DNA used in preparing such genetic pack- 
ages. 

The invention likewise encompasses the procedure by 
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which the display strategy is verified. The genetic 
packages are engineered to • display a single IPBD se- 
quence. (Variability may be introduced into DNA subse- 
quences adjacent to the ipbd subsequence and within the 
osp-ipbd gene so that the IPBD will appear on the GP 
surface.) A molecule, such as an antibody, having high 
affinity for correctly folded IPBD is used to: a) detect 
IPBD on the GP surface, b) screen colonies for display of 
IPBD on the GP surface, or c) select GPs that display IPBD 
from a population, some members of which might display 
IPBD on the GP surface. In one preferred embodiment, this 
verification process (part I) involves: 

1) choosing a GP such as a bacterial cell, bacterial 
spore, or phage, having a suitable outer surface 
protein (OSP) , 

2) choosing a stable IPBD, 

3) designing an amino acid sequence that: a) includes 
the IPBD as a subsequence and b) will cause the IPBD 
to appear on the GP surface, 

4) engineering a gene, denoted osp-ipbd , that: a) codes 
for the designed animo acid sequence, b) provides the 
necessary genetic regulation, and c) introduces 
convenient sites for genetic manipulation, 

5) cloning the osp-ipbd gene into the GP, and 

6} harvesting the transformed GPs and testing them for 
presence of IPBD on the GP surface; this test is 
performed with an affinity molecule having high 
affinity for IPBD, denoted AfM(IPBD) . 

Once a GP(IPBD) is produced, it can be used many 
times as the starting point for developing different novel 
proteins that bind to a variety of different targets. The 
knowledge of how we engineer the appearance of one IPBD on 
the surface of a GP can be used to design and produce 
other GP(IPBD)s that display different IPBDs. 
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Knowing that a particular genetic package and osp- 
ipbd fusion are suitable for the practice of the inven- 
tion, we may variegate the genetic packages and select for 
binding to a target of interest* Using IPBD as the PPBD 
5 to the first cycle of variegation, we prepare a wide 
variety of osp-pbd genes that encode a wide variety of 
PBDs. We use an affinity separation to enrich the 
population of GP(vgPBD)s for GPs that display PBDs with 
binding properties relative to the target that . are 
10 superior to the binding properties of the PPBD. An SBD 
selected from one variegation cycle becomes the PPBD to 
the next variegation cycle. In a preferred embodiment , 
Part II of the process of the present invention involves: 

1) picking a target molecule, and an affinity separation 
15 system which selects for proteins having an affinity 

for that target molecule, 

2) picking a GP(IPBD) , 

3) picking a set of several residues in the PPBD to 
vary; the principal indicators of which residues to 

20 vary include: a) the 3D structure of the IPBD, b) 

sequences of homologous proteins, and c) computer or 
theoretical modeling that indicates which residues 
can tolerate different amino acids without -disrupting 
the underlying structure, 
25 4) picking a subset of the residues picked in Part II. 3, 
to be varied simultaneously? the principal considera- 
tions are the number of different variants and which 
variants are within the detection capabilities of the 
affinity separation system, and setting the range of 
30 variation; 

5) implementing the variegation by: 

a) synthesizing the part of the osp-pbd gene that 
encodes the residues to be varied using a 
specific mixture of nucleotide substrates for 
35 some or all of the bases encoding residues 
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slated for variation, thereby creating a 
population of DNA molecules, denoted vgDNA, 

b) ligating this vgDNA, by standard methods, into 
the operative cloning vector (OCV) ( e.g. a 
plasmid or bacteriophage) , 

c) using the ligated DNA to transform cells, 
thereby producing a population of transformed 
cells, 

d) culturing ( i.e. increasing in number) the 
population of transformed cells and harvesting 
the population of GP(PBD)s, said population 
being denoted as GP(vgPBD), 

e) enriching the population for GPs that bind the 
target by using affinity separation, with the 
chosen target molecule as affinity molecule, 

f) repeating steps II. 5. d and II. 5. e until a 
GP(SBD) having improved binding to the target is 
isolated, and 

g) testing the isolated SBD or SBDs for affinity 
and specificity for the chosen target, 

6) repeating steps 11*3, II. 4, and II. 5 until the 
desired degree of binding is obtained. 

Part II is repeated for each new target material* 
Part I need be repeated only if no GP(IPBD) suitable to a 
chosen target is available. 

For each target, there are a large number of SBDs 
that may be found by the method of the present invention. 
The process relies on a combination of protein structural 
considerations, probabilities, and targeted mutations with 
accumulation of information* To increase the probability 
that some PBD in the population will bind to the target, 
we generate as large a population as we can conveniently 
subject to selection-through-binding in one experiment* 
Key questions in management of the method are "How many 



! r | 



24 

transformants can we produce?", and "How small a component 
can we find through select ion-through-binding? " . The 
optimum level of variegation is determined by the maximum 
number of transf ormants and the selection sensitivity, so 
5 that for any reasonable sensitivity we may use a progres- 
sive process to obtain a series of proteins with higher 
and higher affinity for the chosen target material. 

The appended claims are hereby incorporated by 
reference into this specification as an enumeration of the 
10 preferred embodiments. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows how a phage may be used as a genetic 

package. At (a) we have a wild-type precoat protein 

15 lodged in the lipid bilayer. The signal peptide is 

* in the periplasmic space. At (b) , a chimeric precoat 
protein, with a potential binding domain interposed 
between the signal peptide and the mature coat 
protein sequence, is similarly trapped* At (c) and 

20 (d) , the signal peptide has been cleaved off the 

wild-type and chimeric proteins, respectively, but 
certain residues of the coat protein . sequence 
interact with the lipid bilayer to prevent the 
mature protein from passing entirely into the 

25 periplasm. At (e) and (f), mature wild-type and 

chimeric protein are assembled into the coat of a 
single stranded DNA phage as it emerges into the 
periplasmic space. The phage will pass through the 
outer membrane into the medium where it can be 

30 recovered and chromatographically evaluated. 

Figure 2 depicts (a) the optimal stereochemistry of a 

disulfide bond, based on Creighton, "Disulfide Bonds 
and Protein Stability" (CREI88) (the two possible 
torsion angles about the disulfide bond of +90° and 
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-90° are equally likely), and (b) the standard 
geometric parameters, for the disulfide bond, follow- 
ing Katz and Kossiakoff (KATZ86) . The average Ca-Ca 
distance is 5-6 A, and the typical S-S bond length is 
5 &2.0 A. Many left-hand disulfides adopt as a 

preferred geometry Xl=-60° , X2=-60° , X3=-85° , X2'=- 
60°, Xl^-SO 0 , Ca-Ca = 5.88 A; right-hand disulfides 
are more variable. 

Figure 3 shows a mini-protein comprising eight residues, 
10 numbered 4 through 11 and in which residues 5 and 10 

are joined by a disulfide. The j3 carbons are 
labeled for residues 4, 6, 7, 8, 9, and 11; these, 
residues are preferred sites of variegation. 

Figure 4 shows the C a of the coat protein of phage fl. 
15 Figure 5 shows the construction of M13-MB51. 

Figure 6 shows construction of MK-BPTI, also known as 
BPTI-III MK. 

Figure 7 illustrates fractionation of the Mini PEPI 
library on HNE beads. The abscissae shows pH of 
20 buffer. The ordinants show amount of phage (as 

fraction of input phage) obtained at given pH. 
Ordinants scaled by 10 3 . 
Figure 8 illustrates fractionation of the MYMUT .PEPI 
library on HNE beads. The abscissae shows pH of 
25 buffer. The ordinants show amount of phage (as 

fraction of input phage) obtained at given pH. 
Ordinants scaled by 10 3 . 
Figure 9 shows the elution profiles for EpiNE clones 1, 
3, and 7. Each profile is scaled so that the peak is 
30 1.0 to emphasize the shape of the curve. 

Figure 10 shows pH profile for the binding of BPTI-III MK 
and EpiNE 1 on cathepsin G beads. The abscissae shows 
pH of buffer. The ordinants show amount of phage (as 
fraction of input phage) obtained at given pH. 
35 Ordinants scaled by 10 3 . 
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Ficnire 11 shows pH profile for the f raxctionation of the 
MYMUT Library on cathepsin G beads. The abscissae 
shows pH of buffer. The ordinants show amount of 
phage (as fraction of input phage) obtained at given 
5 pH. Ordinants scaled by 10 3 . 

Figure 12 shows a second fractionation of MYMUT library 
over cathepsin G. 

Figure 13 shows elution profiles on immobilized cathepsin 
G for phage selected for binding to cathepsin G. 
10 Figure 14 shows the C a s of BPTI and interaction set #2. 

Figure 15 shows the main chain of scorpion toxin (Brook- 
haven Protein Data Bank entry 1SN3) . residues 20 
through 42. CYS 2 s and CYS 4l are shown forming a 
disulfide. In the native protein these groups form 
15 disulfides to other cysteines, but no main-chain 

motion is required to bring the gamma sulphurs into 
acceptable geometry. Residues, other than GLY, are 
labeled at the $ carbon with the one-letter code. 

Figure 16 shows profiles of the elustion of phage that 
20 display EpiNE7 and EpiNE7.23 from HNE beads. 
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III. VARIEGATION STRATEGY - MUTAGENESIS TO OBTAIN 
POTENTIAL BINDING DOMAINS WITH DESIRED DIVERSITY 
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5 C. Determining the Substitution Set for Each 

Parental Residue 

D. Special Considerations Relating to Variegation 
of Mini-Proteins with Essential Cysteines 

E. Planning the Second and Later Rounds of 
10 Variegation 
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20 I. DEFINITIONS AND ABBREVIATIONS 

Let (x,y) be a dissociation constant, 
[x] [y] 
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K d (x,y) = 



[x:y] 



For the purposes of the appended claims, a protein P is a 
binding protein if (1) For one molecular, ionic or atomic 
species A, other than the variable domain of an antibody, 
the dissociation constant K D (P,A) < 10" 6 moles/liter 

30 (preferably, < 10~ 7 moles/liter), and (2) for a different 
molecular, ionic or atomic species B, K D (P,B) > 10" 4 
moles/liter (preferably, > 10" 1 moles/liter) . As a result 
of these two conditions, the protein P exhibits specifi- 
city for A over B, and a minimum degree of affinity (or 

35 avidity) for A. 
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The exclusion of "variable domain of an antibody" in 
(1) above is intended to .make clear that for the purposes 
herein a protein is not to be considered a "binding 
protein" merely because it is antigenic. However, an 
5 antigen may nonetheless qualify as a binding protein 
because it specifically binds to a substance other than an 
antibody, e.g. . an enzyme for its substrate, or a hormone 
for its cellular receptor* Additionally, it should be 
pointed out that "binding protein" may include a protein 
10 which binds specifically to the Fc of an antibody, e.g. , 
staphylococcal protein A. 

Normally, the binding protein will not be an antibody 
m or a antigen-binding derivative thereof. An antibody is a 

i|l crosslinked complex of four polypeptides (two heavy and 

W is two light chains) . The light chains of IgG have a 

molecular weight of «23,000 daltons and the heavy chains 
m of »53,000 daltons. A single binding unit is coiaposed cf 

SJ the variable region of a heavy chain (V H ) and the variable 

W region of a light chain (V L ) , each about 110 amino-acid 

P | 2 0 residues. The V H and V L regions are held in proximity by 

Ql a disulfide bond between the adjoining C L and C H1 regions; 

FU altogether, these total 440 residues and correspond to an 

S{ Fab fragment. Derivatives of antibodies include Fab 

yi fragments and the individual variable light and heavy 

25 domains. A special case of antibody derivative is a 
"single chain antibody." A "single-chain antibody" is a 
single chain polypeptide comprising at least 2 00 amino 
acids, said amino acids forming two antigen-binding 
regions connected by a peptide linker that allows the two 
30 regions to fold together to bind the antigen in a manner 
akin to that of an Fab fragment. Either the two antigen- 
binding regions must be variable domains of known anti- 
bodies, or they must (1) each fold into a p barrel of 
nine strands that are spatially related in the same way 
35 as are the nine strands of known antibody variable light 



or heavy domains, and (2) fit together in the same way as 
do the variable domains of said known antibody. Generally 
speaking, this will require that, with the exception of 
the amino acids corresponding to the hypervariable region, 
there is at least 88% homology with the amino acids of the 
variable domain of a known antibody. 

While the present invention may be used to develop 
novel antibodies through variegation of codons correspond- 
ing to the hypervariable region of an antibody's variable 
domain, its primary utility resides in the development of 
binding proteins which are not antibodies or even variable 
domains of antibodies. Novel antibodies can be obtained 
by immunological techniques; novel enzymes, hormones, etc. 
cannot . 

It will be appreciated that, as a result of evolu- 
tion, the antigen-binding domains of antibodies have 
acquired a structure which tolerates great variability of 
sequence in the hypervariable regions. The remainder of 
the variable domain is made up of constant regions forming 
a distinctive structure, a nine strand & barrel, which 
hold the hypervariable regions (inter-strand loops) in a 
fixed relationship with each other. Most other binding 
proteins lack this molecular design which facilitates 
diversification of binding characteristics. Consequently, 
the successful development of novel antibodies by modifi- 
cation of sequences encoding known hypervariable regions — 
which, in nature, vary from antibody to antibody — does not 
provide any guidance or assurance of success in the 
development of novel, non- immunoglobulin binding proteins. 

It should further be noted that the affinity of 
antibodies for their target epitopes is typically on the 
order of 10 6 to 10 10 liters/mole; many enzymes exhibit 
much greater affinities (10 9 to 10 15 liters/mole) for 
their preferred substrates. Thus, if the goal is to 
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develop a binding protein with a very high affinity for a 
target of interest, e.g. , greater than 10 10 , the antibody 
design may in fact be unduly limiting. Furthermore, the 
complementarity-determining residues of an antibody 
comprises many residues, 30 to 50. In most cases, it is 
not known which of these residues participates directly in 
binding antigen. Thus, picking an antibody as PPBD does 
not allow us to focus variegation to a small number of 
residues. 

Most larger proteins fold into distinguishable 
globules called domains (R0SS81) . Protein domains have 
been defined various ways, but all definitions fall into 
one of three classes: a) those that define a domain in 
terms of 3D atomic coordinates, b) those that define a 
domain as an isolable, stable fragment of a larger 
protein, and c) those that define a domain based on 
protein sequence homology plus a method frcm class a) or 
b) . Frequently, different methods of defining domains 
applied to a single protein yield identical or very 
similar domain boundaries. The diversity of definitions 
for domains stems from the many ways that protein domains 
are perceived to be important, including the concept of 
domains in predicting the boundaries of stable fragments, 
and the relationship of domains to protein folding, 
function, stability, and evolution. The present invention 
emphasizes the retention of the structured character of a 
domain even though its surface residues are mutated. 
Consequently, definitions of "domain" which emphasize 
stability — retention of the overall structure in the 
face of perturbing forces such as elevated temperatures or 
chaotropic agents — are favored, though atomic coor- 
dinates and protein sequence homology are not completely 
ignored . 

When a domain of a protein is primarily responsible 
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for the protein's ability to specifically bind a chosen 
target, it is referred to herein as a "binding domain" 
(BD) . A preliminary operation is to engineer the appear- 
ance of a stable protein domain, denoted as an "initial 
5 potential binding domain" (IPBD) , on the surface of a 
genetic package. 

The term "variegated DNA" (vgDNA) refers to a mixture 
of DNA molecules of the same or similar length which, when 
aligned, vary at some codons so as to encode at each such 

10 codon a plurality of different amino acids, but which 
encode only a single amino acid at other codon positions. 
It is further understood that in variegated DNA, the 
codons which are variable, and the range and frequency of 
occurrence of the different amino acids which a given 

15 variable codon encodes, are determined in advance by the 
synthesizer of the DNA, even though the synthetic method 
does not allow one to know, a priori, the sequence of any 
individual DNA molecule in the mixture. The number of 
designated variable codons in the variegated DNA is 

20 preferably no more than 20 codons, and more preferably no 
more than 5-10 codons. The mix of amino acids encoded at 
each variable codon may differ from codon to codon. 

A population of genetic packages into which vari- 
egated DNA has been introduced is likewise said to be 

2 5 "variegated" . 

For the purposes of this invention, the term "poten- 
tial binding protein" refers to a protein encoded by one 
species of DNA molecule in a population of variegated DNA 
wherein the region of variation appears in one or more 

3 0 subsequences encoding one or more segments of the polypep- 

tide having the potential of serving as a binding domain 
for the target substance. 

From time to time, it may be helpful to speak of the 
"parent sequence" of the variegated DNA. When the novel 
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binding domain sought is an analogue of a known binding 
domain, the parent sequence is the sequence that encodes 
the known binding domain. The variegated DNA will be 
identical with this parent sequence at one or more loci, 
5 but will diverge from it at chosen loci. When a potential 
binding domain is designed from first principles, the 
parent sequence is a sequence which encodes the amino acid 
sequence that has been predicted to form the desired 
binding domain, and the variegated DNA is a population of 
10 "daughter DNAs 11 that are related to that parent by a 
recognizable sequence similarity* 

A "chimeric protein" is a protein composed of a first 
P1 amino acid sequence substantially corresponding to the 

j;| sequence of a protein or to a large fragment of a protein 

IB is (20 or more residues) expressed by the species in which 

ff> the chimeric protein is expressed and a second amino acid 

m sequence that does not substantially correspond to an 

amino acid sequence of a protein expressed by the first 
W species but that does substantially correspond to the 

^ 2 0 sequence of a protein expressed by a second and different 

ffi species of organism. The second sequence is said to be 

FU foreign to the first sequence • 

r|- - One amino acid sequence of the chimeric proteins of 

M the present invention is typically derived from an outer 

25 surface protein of a "genetic package" as hereafter 
defined. The second amino acid sequence is one which, if 
expressed alone, would have the characteristics of a 
protein (or a domain thereof) but is incorporated into the 
chimeric protein as a recognizable domain thereof. It may 
30 appear at the amino or carboxy terminal of the first amino 
acid sequence (with or without an intervening spacer) , or 
it may interrupt the first amino acid sequence. The first 
amino acid sequence may correspond exactly to a surface 
protein of the genetic package, or it may be modified, 
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e.g. f to facilitate the display of the binding domain. 

In the present invention, the words "select" and 
"selection" are used in the genetic sense; i.e. a biolog- 
ical process whereby a phenotypic characteristic is used 
to enrich a population for those organisms displaying the 
desired phenotype. 

One affinity separation is called a "separation 
cycle"; one pass of variegation followed by as many 
separation cycles as are needed to isolate an SBD, is 
called a "variegation cycle". The amino acid sequence of 
one SBD from one round becomes the PPBD to the next 
variegation cycle. We perform variegation cycles itera- 
tively until the desired affinity and specificity of 
binding between an SBD and chosen target are achieved. 

The following abbreviations will be used throughout 
the present specification: 

Abbreviation Meaning 

GP Genetic Package, e . g - a 



PBD 



wtGP 



X 



x 



SBD 



BD 



IPBD 



BPTI 



bacteriophage 
Wild-type GP - 
Any protein 

The gene for protein X 
Binding Domain 

Bovine pancreatic trypsin 
inhibitor, identical to 
aprotinin (Merck Index, entry 
784, p. 119) 

Initial Potential Binding 
Domain, e.g. BPTI 
Potential Binding Domain, e.g. 
a derivative of BPTI 
Successful Binding Domain, e.g. 
a derivative of BPTI selected 
for binding to a target 



PPBD 
OSP 

5 

OSP-PBD 

OSTS 
10 GP(X) 

GP(X) 

y GP ( osp-pbd ) 

m 15 GP (OSP-PBD) 

Zj GP (pbd) 

s 20 GP(PBD) 

Q' 

i - - - {Q} 

M 

25 

AfM(W) 

30 AfM(W) * 

XINDUCE 



35 OCV 
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Parental Potential Binding 
Domain, i.e. an IPBD or an SBD 
from a previous selection 
Outer Surface Protein, e.g. 
coat protein of a phage or LamB 
from 3L;_ coli 

Fusion of an OSP and a PBD, 
order of fusion not specified 
Outer Surface Transport Signal 
A genetic package containing 
the x gene 

A genetic package that displays 
X on its outer surface 
GP containing an oso-pbd gene 
A genetic package that displays 
PBD on its outside as a fusion 
to OSP 

GP containing a pbd gene, osp 
implicit 

A genetic package displaying 
PBD on its outside, OSP 
unspecified 

An affinity matrix supporting 
"Q", e.g. {T4 lysozyme} is T4 
lysozyme attached to an 
affinity matrix 

A molecule having affinity for 
"W", e.g. trypsin is an 
AfM(BPTI) 

AfM(W) carrying a label, e . g ,. 
125j 

A chemical that can induce 
expression of a gene, e.g. IPTG 
for the lacUVS promoter 
Operative Cloning Vector 
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10 



15 



20 



25 



30 



*d 

% 

DoAMoM 

mfaa 
Ifaa 
Abun(x) 

OMP 

nt 

SP-I 

Y DQ 



-DNA 



Y P1 

L ef f - 
M ntv 



C e ff 



-sensi 



N chroia 



'err 
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A bimolecular dissociation 
constant, K d = [A][B]/[A:B] 
K T = [T] [SBD]/[T:SBD] (T is a 
target) 

% - [N][SBD]/[N:SBD] {N is a 
non- target) 

Density of AfM(W) on affinity 
matrix 

Most-Favored amino acid - 
Least-Favored amino acid 
Abundance of DNA molecules 
encoding amino acid x 
Outer membrane protein 
nucleotide 

Signal-sequence Peptidase I 
Yield of ssDNA up to Q bases 
long 

Maximum length of ssDNA that 
can be synthesized in accep- 
table yield 

Yield of plasmid DNA per volume 
of culture 

DNA ligation efficiency 
Maximum number of transf ormants 
produced from Y d1 qq DNA of 
Insert 

Efficiency of chromatographic 
enrichment, enrichment per pass 
Sensitivity of chromatographic 
separation, can find 1 in N, 
Maximum number of enrichment 
cycles per variegation cycle 
Error level in synthesizing 
vgDNA 

in-frame genetic fusion or 
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protein produced from in-frame 
fused gene 

Single-letter codes for amino acids and nucleotides are 
5 given in Table 1. 

*** 

II. THE INITIAL POTENTIAL BINDING DOMAIN (IPBD) : 
II .A. Generally 

10 The initial potential binding domain may be: 1) a 

domain of a naturally occurring protein, 2) a non-natur- 
ally occurring domain which substantially corresponds in 
sequence to a naturally occurring domain, but which 
differs from it in sequence by one or more substitutions, 

15 insertions or deletions, 3) a domain substantially 
corresponding in sequence to a^ hybrid of subsequences of 
two or more naturally occurring proteins, or 4) an artifi- 
cial domain designed entirely on theoretical grounds 
based on knowledge of amino acid geometries and statis- 

20 tical evidence of secondary structure preferences of 
amino acids. (However, the limitations of a priori 
protein ^design prompted the present invention.) Usually, 
the domain will be a known binding domain, or at least a 
homologue thereof, but it may be derived from a protein 

25 which, while not possessing a known binding activity, 
possesses a secondary or higher structure that lends 
itself to binding activity (clefts, grooves, etc. ) . The 
protein to which the IPBD is related need not have any 
specific affinity for the target material* 

30 In determining whether sequences should be deemed to 

"substantially correspond", one should consider the 
following issues: the degree of sequence similarity when 
the sequences are aligned for best fit according to 
standard algorithms, the similarity in the connectivity 
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patterns of any crosslinks ( e.g. , disulfide bonds) , the 
degree to which the proteins have similar three-dimen- 
sional structures, as indicated by, e.g. , X-ray diffrac- 
tion analysis or NMR, and the degree to which the se- 
5 quenced proteins have similar biological activity. In 
this context, it should be noted that among the serine 
protease inhibitors, there are families of proteins 
recognized to be homologous in which there are pairs of 
members with as little as 30% sequence homology. 

10 A candidate IPBD should meet the following criteria: 

1) a domain exists that will remain stable under the 
conditions of its - intended use (the domain may 
O comprise the entire protein that will be inserted, 

# e.g. BPTI, a-conotoxin GI, or CMTI-III) , 

y f 15 2) knowledge of the amino acid sequence is obtainable, 

ill anc * 

03 3) a molecule is obtainable having specific and high 

affinity for the IPBD, AfM(IPBD) . 

* Preferably, in order to guide the variegation strategy, 
W 20 knowledge of the identity of the residues on the domain's 
5* outer surface, and their spatial relationships, is 

obtainable; however, this consideration is less important 
Cl "if the binding* domain is small, e.g. , under 40 residues. 

Preferably, the IPBD is no larger than necessary 
2 5 because small SBDs (for example, less than 3 0 amino acids) 
can be chemically synthesized and because it is easier to 
arrange restriction sites in smaller amino-acid sequences. 
For PBDs smaller than about 40 residues, an added advan- 
tage is that the entire variegated pbd gene can be 
30 synthesized in one piece. In that case, we need arrange 
only suitable restriction sites in the osp gene. A 
smaller protein minimizes the metabolic strain on the GP 
or the host of the GP. The IPBD is preferably smaller 
than about 2 00 residues. The IPBD must also be large 
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enough to have acceptable binding affinity and specifici- 
ty. For an IPBD lacking covalent crosslinks, such as 
disulfide bonds, the IPBD is preferably at least 4 0 
residues; it may be as small as six residues if it 
5 contains a crosslink. These small, crosslinked IPBDs, 
known as "mini-proteins" , are discussed in more detail 
later in this section* 

Some candidate IPBDs, which meet the conditions set 
forth above, will be more suitable than others, Informa- 

10 tion about candidate IPBDs that will be used to judge the 
suitability of - the IPBD includes: 1) a 3D structure 
(knowledge strongly preferred) , -2) one or more sequences 
homologous to the IPBD (the more homologous sequences 
known, the better), 3) the pi of the IPBD (knowledge 

15 desirable when target is highly charged), 4) the stability 
and solubility as a function of temperature, pH and ionic 
strength (preferably known to be stable over a wide range 
and soluble in conditions of intended use) , 5) ability to 
bind metal ions such as Ca ++ or Mg ++ (knowledge preferred; 

20 binding per se, no preference), 6) enzymatic activities, 
if any (knowledge preferred, activity per se has uses but 
may cause problems), 7) binding properties, if any 
(knowledge preferred, - specif ic binding also preferred), 8) 
availability of a molecule having specific and strong 

25 affinity (K d < 10" 11 M) for the IPBD (preferred), 9) 
availability of a molecule having specific and medium 
affinity (10~ 8 M < K d < 10~ 6 M) for the IPBD (preferred), 
10) the sequence of a mutant of IPBD that does not bind to 
the affinity molecule (s) (preferred), and 11) absorption 

30 spectrum in visible, UV, NMR, etc. (characteristic 
absorption preferred) . 

If only one species of molecule having affinity for 
IPBD (AfM(IPBD) ) is available, it will be used to: a) 
detect the IPBD on the GP surface, b) optimize expression 
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level and density of the affinity molecule on the matrix, 
and c) determine the efficiency and sensitivity of the 
affinity separation. As noted above, however, one would 
prefer to have available two species of AfM(IPBD) , one 
5 with high and one with moderate affinity for the IPBD. 
The species with high affinity would be used in initial 
detection and in determining efficiency and sensitivity, 
and the species with moderate affinity would be used in 
optimization. 

10 If the IPBD is not itself a binding domain of a known 

binding protein, or .if its .native target has not been 
purified, an antibody raised against the IPBD may be used 
as the affinity molecule. Use of an antibody for this 
purpose should not be taken to mean that the antibody is 

15 the ultimate target. 

There are many candidate IPBDs for which all of the 
above information is available or is reasonably practical 
to obtain, for example, bovine pancreatic trypsin inhib- 
itor (BPTI, 58 residues), CMTI-III (29 residues), crambin 

20 (46 residues), third domain of ovomucoid (56 residues), 
heat-stable enterotoxin (ST-Ia of E^. coli) (18 residues), 
a-Conotoxin GI (13 residues), /x-Conotoxin GUI (22 
residues) , Conus King Kong mini-protein" (27 residues) , T4 
lysozyme (164 residues) , and azurin (128 residues) . 

25 Structural information can be obtained from X-ray or 
neutron diffraction studies, NMR, chemical cross linking 
or labeling, modeling from known structures of related 
proteins, or from theoretical calculations. 3D structural 
information obtained by X-ray diffraction, neutron 

30 diffraction or NMR is preferred because these methods 
allow localization of almost all of the atoms to within 
defined limits. Table 50 lists several preferred IPBDs. 
Works related to determination of 3D structure of small 
proteins via NMR inculde: CHAZ85, PEAS90, PEAS 8 8 , CLOR86, 
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CLOR87a, HEIT89, LEC087, WAGN79, and PARD89 • 

In some cases, a protein having some affinity for the 
target may be a preferred IPBD even though some other 
criteria are not optimally met. For example, the VI 
5 domain of CD4 is a good choice as IPBD for a protein that 
binds to gp!20 of HIV. It is known that mutations in the 
region 42 to 55 of VI greatly affect gp!20 binding and 
that other mutations either have much less effect or 
completely disrupt the structure of VI. Similarly, tumor 
10 necrosis factor (TNF) would be a good initial choice if 
one wants a TNF-like molecule having higher affinity for 
the TNF receptor. 

Membrane-bound proteins are not preferred IPBPs, 
though they may serve as a source of outer surface 

15 transport signals. One should distinguish between 
membrane-bound proteins , such as LamB or OmpF, that cross 
the membrane several times forming a structure that is 
embedded in the lipid bilayer and in which the exposed 
regions are the loops that join trans-membrane segments, 

20 from non-embedded proteins, such as the soluble domains of 
CD4, that are simply anchored to the membrane. This is an 
important distinction because it is quite difficult to 
create a soluble derivative of a membrane-bound protein. 
Soluble binding proteins are in general more useful since 

25 purification is simpler and they are more tractable and 
more versatile assay reagents. 

Most of the PBDs derived from a PPBD according to the 
process of the present invention will have been derived by 
variegation at residues having side groups directed toward 
30 the solvent. Reidhaar-Olson and Sauer (REID88a) found 
that exposed residues can accept a wide range of amino 
acids, while buried residues are more limited in this 
regard. Surface mutations typically have only small 
effects on melting temperature of the PBD, but may reduce 
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the stability of the PBD. Hence the chosen IPBD should 
have a high melting temperature (50°C acceptable, the 
higher the better; BPTI melts at 95°C.) and be stable over 
a wide pH range (8,0 to 3.0 acceptable; 11,0 to 2.0 
5 preferred) , so that the SBDs derived from the chosen IPBD 
by mutation and selection-through-binding will retain 
sufficient stability. Preferably, the substitutions in 
the IPBD yielding the various PBDs do not reduce the 
melting point of the domain below «40°C. Mutations may 

10 arise that increase the stability of SBDs relative to the 
IPBD, but the process of the present invention does not 
depend upon this occurring. Proteins containing covalent 
crosslinks, such as multiple disulfides, are usually 
sufficient stable. A protein having at least two disul- 

15 fides and having at least 1 disulfide per every twenty 
residues may be presumed to be sufficiently stable. 

Two general characteristics of the target molecule, 
size and charge, make certain classes of IPBDs more likely 
than other classes to yield derivatives that will bind 

20 specifically to the target. Because these are very 
general characteristics, one can divide all targets into 
six classes: a) large positive, b) large neutral, c) large 
negative, d) small positive, e) small neutral, and f) 
small negative. A small collection of IPBDs, one or a few 

25 corresponding to each class of target, will contain a 
preferred candidate IPBD for any chosen target. 

Alternatively, the user may elect to engineer a 
GP(IPBD) for a particular target; criteria are given 
below that relate target size and charge to the choice of 
30 IPBD. 

II .B. Influence of target size on choice of IPBD: 

If the target is a protein or other macromolecule a 
preferred embodiment of the IPBD is a small protein such 
as the Cucurbita maxima trypsin inhibitor III (29 resi- 
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dues) , BPTI from Bos Taurus (58 residues) , crambin from 
rape seed (46 residues) or the third domain of ovomucoid 
from Coturnix coturnix Japonica (Japanese quail) (56 
residues) , because targets from this class have clefts 
5 and grooves that can accommodate small proteins in highly 
specific ways. If the target is a macromolecule lacking a 
compact structure, such as starch, it should be treated as 
if it were a small molecule. Extended macromolecules with 
defined 3D structure, such as collagen, should be treated 
10 as large molecules. 

If the target is a small molecule, such as a steroid, 
a preferred embodiment of the IPBD is a protein of about 
fi 80-200 residues, such as ribonuclease from Bos taurus (12 4 

*S residues), ribonuclease from Aspergillus oruzae (104 

W 15 residues), hen egg white lysozyme from Gallus gallus (129 

hi residues) , azurin from Pseudomonas aeruaenosa (128 

m residues) , or T4 lysozyme (164 residues) , because such 

% =l proteins have clefts and grooves into which the small 

target molecules can fit* The Brookhaven Protein Data 
O 20 Bank contains 3D structures for all of the proteins 

Ql listed. Genes encoding proteins as large as T4 lysozyme 

^ can be manipulated by standard techniques for the purposes 

pi of this invention. - 

^ If the target is a mineral, insoluble in water, one 

25 considers the nature of the molecular surface of the 
mineral. Minerals that have smooth surfaces, such as 
crystalline silicon, are best addressed with medium to 
large proteins, such as ribonuclease, as IPBD in order to 
have sufficient contact area and specificity. Minerals 
30 with rough, grooved surfaces, such as zeolites, could be 
bound either by small proteins, such as BPTI, or larger 
proteins, such as T4 lysozyme. 

II. C. Influence of target charge on choice of IPBD: 

Electrostatic repulsion between molecules of like 
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charge can prevent molecules with highly complementary 
surfaces from binding. Therefore, it is preferred that, 
under the conditions of intended use, the IPBD and the 
target molecule either have opposite charge or that one of 
5 them is neutral. In some cases it has been observed that 
protein molecules bind in such a way that like charged 
groups are juxtaposed by including oppositely charged 
counter ions in the molecular interface. Thus, inclusion 
of counter ions can reduce or eliminate electrostatic 
10 repulsion and the user may elect to include ions in the 
eluants used in the affinity separation step. Polyvalent 
ions are more effective at reducing repulsion than 
monovalent ions. 

y9 II. D. Other considerations in the choice of IPBD: 

03 

15 If the chosen IPBD is an enzyme, it may be necessary 

yj to change one or more residues in the active site to 

inactivate enzyme function. For example, if the IPBD wer^ 
^2 T4 lysozyme and the GP were coli cells or M13 , we would 

3 need to inactivate the lysozyme because otherwise it would 

CJ 20 lyse the cells. If, on the other hand, the GP were 

%l $X174, then inactivation of lysozyme may not be needed 

[f\ because T4 lysozyme can be overproduced inside coli 

p cells without detrimental effects and $X174 forms intra- 

^ cellularly. It is preferred to inactivate enzyme IPBDs 

25 that might be harmful to the GP or its host by substitu- 
ting mutant amino acids at one or more residues of the 
active site. It is permitted to vary one or more of the 
residues that were changed to abolish the original 
enzymatic activity of the IPBD. Those GPs that receive 
30 osp-pbd genes encoding an active enzyme may die, but the 
majority of sequences will not be deleterious. 

If the binding protein is intended for therapeutic 
use in humans or animals, the IPBD may be chosen from 
proteins native to the designated recipient to minimize 
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the possibility of antigenic reactions. 

II. E. Bovine Pancreatic ■ Trypsin Inhibitor (BPTH as an 
IPBD: 

BPTI is an especially preferred IPBD because it meets 
5 or exceeds all the criteria: it is a small, very stable 
protein with a well known 3D structure. Marks et al. 
(MARK86) have shown that a fusion of the phoA signal 
peptide gene fragment and DNA coding for the mature form 
of BPTI caused native BPTI to appear in the periplasm of 
10 Ej_ coli , demonstrating that there is nothing in the 
structure of BPTI to prevent its being secreted. 

The structure of BPTI is maintained even when one or 
another of the disulfides is removed, either by chemical 
blocking or by genetic alteration of the amino-acid 

15 sequence. The stabilizing influence of the disulfides in 
BPTI is not equally distributed. Goldenberg (GOLD85) 
reports that blocking CYS14 and CYS38 lowers the Tm of 
BPTI to «75°C while chemical blocking of either of the 
other disulfides lowers Tm to below 40 °C Chemically 

2 0 blocking a disulfide may lower Tm more than mutating the 
cysteines to other amino-acid types because the bulky 
blocking groups are more destabilizing than removal of the 
disulfide. Marks et al. (MARK87) replaced both CYS14 and 
CYS38 with either two alanines or two threonines. The 

25 CYS14/CYS38 cystine bridge that Marks et al. removed is 
the one very close to the scissile bond in BPTI; surpris- 
ingly, both mutant molecules functioned as trypsin 
inhibitors. Schnabel et al. (SCHN86) report preparation 
of aprotinin(C14A,C38A) by use of Raney nickel. Eigenbrot 

30 et al^ (EIGE90) report the X-ray structure of BPTI(C30A/- 
C51A) which is stable to at least 50 °C. The backbone of 
this mutant is as similar to BPTI as are the backbones of 
BPTI molecules that sit in different crystal lattices. 
This indicates that BPTI is redundantly stable and so is 
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likely to fold into approximately the same structure 
despite numerous surface mutations. Using the knowledge 
of homologues, vide infra , we can infer which residues 
should not be varied if the basic BPTI structure is to be 
5 maintained* 

The 3D structure of BPTI has been determined at high 
resolution by X-ray diffraction (HUBE77, MARQ83, WLOD84, 
WLOD87a, WLOD87b) , neutron diffraction (WLOD84) , .and by 
NMR (WAGN87) . In one of the X-ray structures deposited in 

10 the Brookhaven Protein Data Bank, entry 6PTI, there was 
no electron density for A58, indicating that A58 has no 
uniquely defined conformation. Thus we know that the 
carboxy group does not make any essential interaction in 
the folded structure. The amino terminus of BPTI is very 

15 near to the carboxy terminus. Goldenberg and Creighton 
reported on circularized BPTI and circularly permuted BPTI 
(GOLD83) . Some proteins homologous tc BPTI have more or 
fewer residues at either terminus. 

BPTI has been called "the hydrogen atom of protein 
20 folding" and has been the subject of numerous experimental 
and theoretical studies (STAT87, SCHW87, GOLD83, CHAZ83, 
CREI74, CREI77a, CREI77b, CREI80, - SIEK87 , SINH90, RUEH73, 
HUBE74, HUBE75, HUBE77 and others). 

BPTI has the added advantage that at least 59 
25 homologous proteins are known. Table 13 shows the 
^sequences of 39 homologues. A tally of ionizable groups 
in 59 homologues is shown in Table 14 and the composite of 
amino acid types occurring at each residue is shown in 
Table 15. 

30 BPTI is freely soluble and is not known to bind metal 

ions. BPTI has no known enzymatic activity. BPTI is not 
toxic. 

All of the conserved residues are buried; of the six 
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fully conserved residues only G37 has noticeable exposure. 
The solvent accessibility of each residue in BPTI is given 
in Table 16 which was calculated from the entry "6PTI" in 
the Brookhaven Protein Data Bank with a solvent radius of 
5 1.4 A, the atomic radii given in Table 7, and the method 
of Lee and Richards (LEEB71) . Each of the 52 non-con- 
served residues can accommodate two or more kinds of amino 
acids* By independently substituting at each residue only 
those amino acids already observed at that residue, we 
10 could obtain approximately 1.6*10 43 different amino acid 
sequences, most of which will fold into structures very 
similar to BPTI . 

BPTI will be especially useful as a IPBD for macro- 
molecular targets. BPTI and BPTI homologues bind tightly 
15 and with high specificity to a number of enzyme macromole- 
cules. 

BPTI is strongly positively charged except at very 
high pH, thus BPTI is useful as IPBD for targets that are 
not also strongly positive under the conditions of 

20 intended use. There exist homologues of BPTI, however, 
having quite different charges ( viz. SCI-III from Bombyx 
- mcri at -7 -and the trypsin inhibitor from bovine colostrum 
at -1) . Once a genetic package is found that displays 
BPTI on its surface, the sequence of the BPTI domain can 

25 be replaced by one of the homologous sequences to produce 
acidic or neutral IPBDs. 

BPTI is quite small; if this should cause a pharma- 
cological problem, two or more BPTI-derived domains may be 
joined as in humans BPTI homologues, one of which has two 
3 0 domains (BALD85, ALBR83b) and another has three (WUNT88) * 

Another possible pharmacological problem is immun- 
igenicity. BPTI has been used in humans with very few 
adverse effects. Siekmann et al. (SIEK89) have studied 
immunological characteristics of BPTI and some homologues. 
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It is an advantage of the method of the present invention 
that a variety of SBDs can be obtained so that, if one 
derivative proves to be antigenic, a different SBD may be 
used. Furthermore, one can reduce the probability of 
immune response by starting with a human protein, such as 
LACI (a BPTI homologue) (WUNT88, GIRA89) or Inter-a- 
Trypsin Inhibitor (ALBR83a, ALBR83b, DIAR90, ENGH89, 
TRIB86, GEBH86, GEBH90, KAUM86, ODOM90, SALI90) . 

Further, a BPTI -derived gene fragment, coding for a 
novel binding domain, could be fused in-frame to a gene 
fragment coding for other proteins, such as serum albumin 
or the constant parts of IgG. 

Tschesche et al. (TSCH87) reported on the binding of 

several BPTI derivatives to various proteases: 

Dissociation constants for BPTI derivatives, Molar. 

Residue Trypsin Chymotrypsin Elasta&e Elastase 

#15 (bovine (bovine (porcine (human 
pancreas) pancreas) pancreas) leukocytes) 

lysine 6.0-10" 14 9.0«10~ 9 

glycine + 

alanine + - 2.8*10~ 8 

valine - - 5.7*10~" 8 

leucine* ~ - 1.9-10 



3.5*10~ 6 
7.0*10~ 9 



-8 



2.5-10 
1.1-10" 
2,9-10" 



-9 



-10 



From the report of Tschesche et al. we infer that mole- 
cular pairs marked have K^s > 3.5* 10~ 6 M and that 
molecular pairs marked have K<js » 3.5«10~ 6 M. 
Because of the wealth of data about the binding of BPTI 
and various mutants to trypsin and other proteases 
(TSCH87), we can proceed in various ways in optimizing the 
affinity separation conditions. (For other PBDs, we can 
obtain two different monoclonal antibodies, one with a 
high affinity having K<j of order 10" 11 M, and one with a 
moderate affinity having on the order of 10" 6 M.) 

Works concerning BPTI and - its homologues include: 
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KID088, P0NT88, KID090, AUER87, AUER90, SC0T87b, AUER88, 
AUER89, BECK88b, WACH79, WACH80, BECK89a, DUFT85, FIOR88, 
GIRA89, GOLD84, GOLD88, HOCH84, RIT083, NORR89a, NORR89b / 
OLTE89, SWAI88, and WAGN79 . 

5 II. F Mini-Proteins as IPBDs: 

A polypeptide is a polymer composed of a single chain 
of the same or different amino acids joined by peptide 
bonds . Linear peptides can take up a very large number of 
different conformations through internal rotations about 
10 the main, chain single bonds of each a carbon. These 
rotations are hindered to -varying degrees by side groups, 
□ with glycine interfering the least, and valine, isoleucine 

41 and, especially, proline, the most. A polypeptide of 20 

residues may have 10 20 different conformations which it 
yj 15 may assume by various internal rotations. 

Jf; Proteins are polypeptides which, as a result of 

ff { stabilizing interactions between amino acids that are not 

a in adjacent positions in the chain, have folded into a 

y well-defined conformation. This folding is usually 

5j 20 essential to their biological activity. 

is ^ 

Jjf For polypeptides of 40-60 residues or longer, 

H noncovalent forces such as hydrogen bonds, salt bridges, 

and hydrophobic "interactions" are sufficient to stabilize 
a particular folding or conf ormation- The polypeptide's 
25 constituent segments are held to more or less that 
conformation unless it is perturbed by a denaturant such 
as rising temperature or decreasing pH, whereupon the 
polypeptide unfolds or "melts" . The smaller the peptide, 
the more likely it is that its conformation will be 
30 determined by the environment. If a small unconstrained 
peptide has biological activity, the peptide ligand will 
be in essence a random coil until it comes into proximity 
with its receptor. The receptor accepts the peptide only 
in one or a few conformations because alternative confor- 
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nations are disfavored by unfavorable van der Waals and 
other non-covalent interactions. 

Small polypeptides have potential advantages over 
larger polypeptides when used as therapeutic or diagnostic 
5 agents , including (but not limited to) : 

a) better penetration into tissues, 

b) faster elimination from the circulation (important 
for imaging agents) , 

c) lower antigenicity, and 
10 d) higher activity per mass* 

Moreover, polypeptides of .under about 50 residues 
have the advantage of accessibility via chemical synthe- 
sis; polypeptides of under about 30 residues are more 
easily synthesized than are larger polypeptides* Thus, it 
15 would be desirable to be able to employ the combination of 
variegation and affinity selection to identify • smalx 
polypeptides which bind a target of choice. 

Polypeptides of this size, however, have disadvan- 
tages as binding molecules. According to Olivera et al . 

20 (OLIV90a) : "Peptides in this size range normally equi- 
librate among many conformations (in order to have a fixed 
conformation, proteins" generally have to be much larger)." 
Specific binding of a peptide to a target molecule 
requires the peptide to take up one conformation that is 

25 complementary to the binding site. For a decapeptide with 
three isoenergetic conformations ( e.g. , /? strand, a helix, 
and reverse turn) at each residue, there are about 6.-10 4 
possible overall conformations. Assuming these conforma- 
tions to be equi -probable for the unconstrained decapep- 

30 tide, if only one of the possible conformations bound to 
the binding site, then the affinity of the peptide for the 
target would expected to be about 6»10 4 higher if it 
could be constrained to that single effective conforma- 
tion. Thus, the unconstrained decapeptide, relative to a 
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decapeptide constrained to the correct conformation, would 
be expected to exhibit lower affinity. It would also 
exhibit lower specificity, since one of the other confor- 
mations of the unconstrained decapeptide might be one 
which bound tightly to a material other than the intended 
target. By way of corollary, it could have less resis- 
tance to degradation by proteases, since it would be more 
likely to provide a binding site for the protease. 

In one embodiment, the present invention overcomes 
these problems, while retaining the advantages of smaller 
polypeptides, by fostering the biosynthesis of novel mini- 
proteins having the desired binding characteristics* 
Mini-Proteins are small polypeptides (usually less than 
about 60 residues) which, while too small to have a stable 
conformation as a result of noncovalent forces alone, are 
covalently crosslinked ( e.cr« , by disulfide bonds) into a 
stable conformation and hence have biological activities 
more typical of larger protein molecules than of uncon- 
strained polypeptides of comparable size. 

When mini-proteins are variegated, the residues which 
are covalently crosslinked in the parental molecule are 
left unchanged, thereby stabilizing the conformation. For - 
example, in the variegation of a disulfide bonded mini- 
protein, certain cysteines are invariant so that under the 
conditions of expression and display, covalent crosslinks 
( e.g. , disulfide bonds between one or more pairs of 
cysteines) form, and substantially constrain the conforma- 
tion which may be adopted by the hypervariable linearly 
intermediate amino acids. In other words, a constraining 
scaffolding is engineered into polypeptides which are 
otherwise extensively randomized. 

Once a mini-protein of desired -binding character- 
istics is characterized, it may be produced, not only by 
recombinant DNA techniques, but also by nonbiological 
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synthetic methods. 

In vitro, disulfide bridges can form spontaneously in 
polypeptides as a result of air oxidation. Matters are 
more complicated in vivo . Very few intracellular proteins 
5 have disulfide bridges, probably because a strong reducing 
environment is maintained by the glutathione system. 
Disulfide bridges are common in proteins that travel or 
operate in intracellular spaces, such as snake venoms and 
other toxins ( e.g. , conotoxins, charybdotoxin, bacterial 
10 enterotoxins) , peptide hormones, digestive enzymes, 
complement proteins, immunoglobulins, lysozymes, protease 
inhibitors (BPTI and its homologues, CMTI-III ( Cucurbit a 
maxima trypsin inhibitor III) and its homologues, hirudin, 
etc. ) and milk proteins. 

15 Disulfide bonds that close tight intrachain loops 

have bean found in pepsin, thioredoxin, insulin A-chain, 
silk fibroin, and lipoamide dehydrogenase. The bridged 
cysteine residues are separated by one to four residues 
along the polypeptide chain. Model building, X-ray 

20 diffraction analysis, and NMR studies have shown that the 
a carbon path of such loops is usually flat and rigid. 

There are two" types of disulfide" bridges" in immuno- 
globulins. One is the conserved intrachain bridge, 
spanning about 60 to 70 amino acid residues and found, 

25 repeatedly, in almost every immunoglobulin domain. Buried 
deep between the opposing p sheets, these bridges are 
shielded from solvent and ordinarily can be reduced only 
in the presence of denaturing agents. The remaining 
disulfide bridges are mainly interchain bonds and are 

30 located on the surface of the molecule; they are acces- 
sible to solvent and relatively easily reduced (STEI85) . 
The disulfide bridges of the mini-proteins of the present 
invention are intrachain linkages between cysteines having 
much smaller chain spacings. 
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For the purpose of the appended claims, a mini- 
protein has between about eight and about sixty residues. 
However, it will be understood that a chimeric surface 
protein presenting a mini-protein as a domain will 
5 normally have more than sixty residues. Polypeptides 
containing intrachain disulfide bonds may be characterized 
as cyclic in nature, since a closed circle of covalently 
bonded atoms is defined by the two cysteines, the inter- 
mediate amino acid residues, their peptidyl bonds, and the 

10 disulfide bond. The terms "cycle", "span" and "segment" 
will be used to define certain structural ■ features of the 
polypeptides. An intrachain disulfide bridge" connecting 
amino acids 3 and 8 of a 16 residue polypeptide will be 
said herein to have a cycle of 6 and a span of 4. If 

15 amino acids 4 and 12 are also disulfide bonded, then they 
form a second cycle of 9 with a span of 7. Together, the 
four cysteines divide the polypeptide into four inter- 
cysteine segments (1-2, 5-7, 9-11, and 13-16). (Note that 
there is no segment between Cys3 and Cys4.) 

20 The connectivity pattern of a crosslinked mini- 

protein is a simple description of the relative location 
of the termini of the crosslinks. For example, for a 
mini-protein with two disulfide bonds, the connectivity 
pattern "1-3, 2-4" means that the first crosslinked 

25 cysteine is disulfide bonded to the third crosslinked 
cysteine (in the primary sequence), and the second to the 
fourth. 

The degree to which the crosslink constrains the 
conformational freedom of the mini-protein, and the degree 
30 to which it stabilizes the mini-protein, may be assessed 
by a number of means. These include absorption spectros- 
copy (which can reveal whether an amino acid is buried or 
exposed) , circular dichroism studies (which provides a 
general picture of the helical content of the protein) , 
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nuclear magnetic resonance imaging (which reveals the 
number of nuclei in a particular chemical environment as 
well 'as the mobility of nuclei), and X-ray or neutron 
diffraction analysis of protein crystals . The stability 
of the mini-protein may be ascertained by monitoring the 
changes in absorption at various wavelengths as a function 
of temperature, pH, etc. ; buried residues become exposed 
as the protein unfolds. Similarly, the unfolding of the 
mini-protein as a result of denaturing conditions results 
in changes in NMR line positions and widths. Circular 
dichroism (CD) spectra are extremely sensitive to confor- 
mation. 

The variegated disul fide-bonded mini-proteins of the 
present invention fall into several classes. 

Class I mini-proteins are those featuring a single 
pair of cysteines capable of interacting to form a 
disulfide bond, said bond having a span of no more than 
nine residues* This disulfide bridge preferably has a 
span of at least two residues; this is a function of the 
geometry of the disulfide bond. When the spacing is two 
or three residues, one residue is preferably glycine in 
order to reduce the strain on the bridged residues^ ..The. 
upper limit on spacing is less precise, however, in 
general, the greater the spacing, the less the constraint 
on conformation imposed on the linearly intermediate amino 
acid residues by the disulfide bond. 

The main chain of such a peptide has very little 
freedom, but is not stressed. The free energy released 
when the disulfide forms exceeds the free energy lost by 
the main-chain when locked into a conformation that brings 
the cysteines together. Having lost the free energy of 
disulfide formation, the proximal ends of the side groups 
are held in more or less fixed relation to each other. 
When binding to a target, the domain does not need to 
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expend free energy getting into the correct conformation. 
The domain can not jump into some other conformation and 
bind a non-target. 

A disulfide bridge with a span of 4 or 5 is espe- 
cially preferred. If the span is increased to 6, the 
constraining influence is reduced. In this case, we 
prefer that at least one of the enclosed residues be an 
amino acid that imposes restrictions on the main-chain 
geometry. Proline imposes the most restriction. Valine 
and isoleucine restrict the main chain to a lesser extent. 
The preferred position for this constraining non-cysteine 
residue is adjacent to one of the invariant cysteines, 
however, it may be one of the other bridged residues. If 
the span is seven, we prefer to include two amino acids 
that limit main-chain conformation. These amino acids 
could be at any of the seven positions, but are preferably 
the two bridged residues that are immediately adjacent to 
the cysteines. If the span is eight or nine, additional 
constraining amino acids may be provided. 

The disulfide bond of a class I mini-proteins is 
exposed to solvent. Thus, one should avoid exposing the 
variegated population of GPs that display class I -mini- - 
proteins to reagents that rupture disulfides; Creighton 
names several such reagents (CREI88) . 

Class II mini-proteins are those featuring a single 
disulfide bond having a span of greater than nine amino 
acids. The bridged amino acids form secondary structures 
which help to stabilize their conformation. Preferably, 
these intermediate amino acids form hairpin supersecondary 
structures such as those schematized below: 

, S— S 1 

-Cys-ahelix-turn-/3strand-Cys- 
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-Cys-ahelix-turn-ochelix-cys- 

i -s— s 1 

-Cys-/3strand-turn-£strand-Cys- 



5 Secondary structures are stabilized by hydrogen bonds 
between amide nitrogen and carbonyl groups, by interac- 
tions between charged side groups and helix dipoles, and 
by van der Waals contacts. One abundant secondary 
structure in proteins is the a-helix. The a helix has 3.6 

10 residues per turn, a 1*5 A rise per residue, and a helical 
radius of 2.3 A. All observed a-helices are right-handed. 
The torsion angles <f> (-57°) and $ (-47°) are favorable 
for most residues, and the hydrogen bond between the 
backbone carbonyl oxygen of each residue and the backbone 

15 NH of the fourth residue along the chain is 2.86 A long 
(nearly the optimal distance) and virtually straight. 
Since the hydrogen bonds all point in the same direction, 
the a helix has a considerable dipole moment (carboxy 
terminus negative) . 

20 The strand may be considered an elongated helix 

with 2.3 residues per turn, a translation of 3.3 A per 
residue, and a helical radius of 1.0 A. Alone, a p strand 
forms no main-chain hydrogen bonds. Most commonly, p 
strands are found in twisted (rather than planar) paral- 

25 lei, antiparallel, or mixed parallel/ antiparallel sheets. 

A peptide chain can form a sharp reverse turn. A 
reverse turn may be accomplished with as few as four amino 
acids. Reverse turns are very abundant, comprising a 
quarter of all residues in globular proteins. In pro- 
30 teins, reverse turns commonly connect P strands to form p 
sheets, but may also form other connections. A peptide 
can also form other turns that are less sharp. 

Based on studies of known proteins, one may calculate 
the propensity of a particular residue, or of a particular 
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dipeptide or tripeptide, to be found in an a helix, £ 
strand or reverse turn,' The normalized frequencies of 
occurrence of the amino acid residues in these secondary 
structures is given in Table 6-4 of CREI84. For a more 
5 detailed treatment on the prediction of secondary struc- 
ture from the amino acid sequence, see Chapter 6 of 
SCHU79 . 

In designing a suitable hairpin structure, one may 
copy an actual structure from a protein whose three- 
10 dimensional conformation is known, design the structure 
using frequency data, or combine the two approaches. 
Preferably, one or more actual structures are used as a 
i model, and the frequency data is used to determine which 

n mutations can be made without disrupting the structure . 

^ 15 Preferably, no more than three amino acids lie 

n between the cysteine and the beginning or end of the a 

4 helix or p strand. 

More complex structures (such as a double hairpin) 
3 are also possible. 

j; 20 Class III mini-proteins are those featuring a 

B plurality of disulfide bonds. They optionally may also 

3 feature secondary structures such as those discussed above 

with regard to Class II mini-proteins. Sxnce the number 
of possible disulfide bond topologies increases rapidly 
25 with the number of bonds (two bonds, three topologies; 
three bonds, 15 topologies; four bonds, 105 topologies) 
the number of disulfide bonds preferably does not exceed 
four. With two or more disulfide bonds, the disulfide 
bridge spans preferably do not exceed 50, and the largest 
3 0 intercysteine chain segment preferably does not exceed 20. 

Naturally occurring class III mini-proteins, such as 
heat-stable enterotoxin ST-Ia frequently have pairs of 
cysteines that are adjacent in the amino-acid sequence. 
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Adjacent cysteines are very unlikely to form an intra- 
molecular disulfide and cysteines separated by a single 
amino acids form an intramolecular disulfide with dif- 
ficulty and only for certain intervening amino acids. 
5 Thus, clustering cysteines within the amino-acid sequence 
reduces the number of realizable disulfide bonding 
schemes. We utilize such clustering in the class III 
mini-protein disclosed herein. 

Metal Finger Mini-Proteins. T he mini-proteins of the 

10 present invention are not limited to those crosslinked by 
disulfide bonds. Another important class of mini-proteins 
are analogues of finger proteins. Finger proteins are 
characterized by finger structures in which a metal ion is 
coordinated by two Cys and two His residues, forming a 

15 tetrahedral arrangement around it. The metal ion is most 
often zinc(II), but may be iron, copper, cobalt, etc. The 
"finger" has the consensus sequence (Phe or Tyr)-(1 ?A}- 
Cys- (2-4 AAs)-Cys-(3 AAs)-Phe-(5 AAs)-Leu-(2 AAs)-His-(3 
AAs)-His-(5 AAs) (BERG88 ; GIBS88) . While finger proteins 

20 typically contain many repeats of the finger motif, it is 
known that a single finger will fold in the presence of 
zinc ions (FRAN87; PARH88) . There is some dispute as to 
whether two fingers are necessary for binding to DNA. The 
present invention encompasses mini-proteins with either 

25 one or two fingers. It is to be understood that the 
target need not be a nucleic acid. 

G. Modified PBSs 

There exist a number of enzymes and chemical reagents 
that can selectively modify certain side groups of 
30 proteins, including: a) protein-tyrosine kinase, Ellmans 
reagent, methyl transferases (that methylate GLU side 
groups), serine kinases, proline hydroxyases, vitamin-K 
dependent enzymes that convert GLU to GLA, maleic anhy- 
dride, and alkylating agents. Treatment of the variegated 
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population of GP(PBD)s with one of these enzymes or 
reagents will modify the side groups affected by the 
chosen enzyme or reagent. Enzymes and reagents that do 
not kill the GP are much preferred. Such modification of 
5 side groups can directly affect the binding properties of 
the displayed PBDs. Using affinity separation methods, we 
enrich for the modified GPs that bind the predetermined 
target. Since the active binding domain is not entirely 
genetically specified, we must repeat the post-morpho- 
10 genesis modification at each enrichment round. This 
approach is particularly appropriate with mini-protein 
IPBDs because we envision chemical synthesis of these 
M SBDs. 

fQ III. VARIEGATION STRATEGY — MUTAGENESIS TO OBTAIN 

hi 15 POTENTIAL BINDING DOMAINS WITH DESIRED DIVERSITY 

K III. A. Generally 

CO Using standard genetic engineering techniques, a 

^ molecule of variegated DNA can be introduced into a vector 

m so that it constitutes part of a gene (0LIP86, OLIP87, 

fll 20 AUSU87, REID88a) . When vector containing variegated DNA 

2 1 are used to transform bacteria, each cell makes a version 

of the original protein. Each colony of bacteria may 
produce a different version from any other colony. If the 
variegations of the DNA are concentrated at loci known to 
25 be on the surface of the protein or in a loop, a popula- 
tion of proteins will be generated, many members of which 
will fold into roughly the same 3D structure as the parent 
protein. The specific binding properties of each member, 
however, may be different from each other member. 

3 0 We now consider the manner in which we generate a 

diverse population of potential binding domains in order 
to facilitate selection of a PBD-bearing GP which binds 
with the requisite affinity to the target of choice. The 
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potential binding domains are first designed at the amino 
acid level. Once we have identified which residues are to 
be mutagenized, and which mutations to allow at those 
positions, we may then design the variegated DNA which is 
5 to encode the various PBDs so as to assure that there is a 
reasonable probability that if a PBD has an affinity for 
the target, it will be detected. Of course, the number of 
independent trans formants obtained and the sensitivity of 
the affinity separation technology will impose limits on 

10 the extent of variegation possible within any single round 
of variegation. 

There are many ways to generate diversity in a 
protein. (See RICH86, CARU85, and OLIP86.) At one 
extreme, we vary a few residues of the protein as much as 

15 possible ( inter alia see CARU85, CARU87, RICH86, and 
WHAR86) . We will call this approach "Focused , Mutagen- 
esis". A typical "Focused Mutagenesis" strategy is to 
pick a set of five to seven residues and vary each through 
13-20 possibilities. An alternative plan of mutagenesis 

20 ("Diffuse Mutagenesis") is to vary many more residues 
through a more limited set of choices (See VERS86a and 
PAKU86) . The variegation pattern adopted may fall between 
these extremes, e.g. , two residues varied through all 
twenty amino acids, two more through only two possibil- 

25 ities, and a fifth into ten of the twenty amino acids. 

There is no fixed limit on the number of codons which 
can be mutated simultaneously. However, it is desirable 
to " adopt a mutagenesis strategy which results in a 
reasonable probability that a possible PBD sequence is in 
30 fact displayed by at least one genetic package. When the 
size of the set of amino acids potentially encoded by each 
variable codon is the same for all variable codons and 
within the set all amino acids are eguiprobable, this 
probability may be calculated as follows: Let r(k,q) be 
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the probability that amino acid number k will occur at 
variegated codon q; these codons need not be contiguous* 
The probability that a particular vgDNA molecule will 
encode a PBD containing n variegated amino acids k^ . .., 
5 k n is: 

p(k x , k n ) -r^,!)- ... -r(k n ,n) 

Consider a library of N^ t independent transformants 
prepared with said vgDNA ; the probability that the 
sequence k x , ... ,k n is absent is: 

10 P(missing k x , . .., k n ) = exp{-Ni t -p(k 1 , . k n ) } . 

P(k x , . k n in lib) = 1 - exp{-N it -p(k 1/ . k n ) } . 

Preferably, the probability that a mutein encoded by the 
vgDNA and composed of the least favored amino acids at 
each variegated position will be displayed by at least one 

15 independent transformant in the library is at least 0.50, 
and more preferably at least 0-90. (Muteins composed of 
more favored amino acids would of course be more likely to 
occur in the same library.) 

Preferably, the variegation is such as will cause a 

20 typical transformant population to display 10 6 -10 7 
different amino acid sequences by means of preferably not 
more than 10-fold more (more preferably not more than 3- 
fold) different DNA sequences. 

For a mini-protein that lacks a helices and p 
25 strands, one will, in any given round of mutation, 
preferably variegate each of 4-6 non-cysteine codons so 
that they each encode at least eight of the 20 possible 
amino acids. The variegation at each codon could be 
customized to that position. Preferably, cysteine is not 
3 0 one of the potential substitutions, though it is not 
excluded . 

When the mini-protein is a metal finger protein, in a 
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typical variegation strategy, the two Cys and two His 
residues, and optionally also the aforementioned Phe/Tyr f 
Phe and Leu residues, are held invariant and a plurality 
(usually 5-10) of the other residues are varied. 

5 When the mini-protein is of the type featuring one or 

more a helices and p strands, the set of potential amino 
acid modifications at any given position is picked to 
favor those which are less likely to disrupt the secondary 
structure at that position. Since the number of possibil- 
10 ities at each variable amino acid is more limited, the 
total number of variable amino acids may be greater 
without altering the sampling efficiency of the selection 
process* 

For the last-mentioned class of mini-proteins, as 
15 well as domains other than mini-proteins, preferably not 
more than 2 0 and more preferably 5-10 codons will be 
variegated. However, if diffuse mutagenesis is employed, 
the number of codons which are variegated can be higher. 

The decision as to which residues to modify is eased 
20 by knowledge of which residues lie on the surface of the 
domain and which are buried in the interior. 

We choose residues in the IPBD to vary through 
consideration of several factors, including: a) the 3D 
structure of the IPBD, b) sequences homologous to IPBD, 

25 and c) modeling of the IPBD and mutants of the IPBD. When 
the number of residues that could strongly influence 
binding is greater than the number that should be varied 
simultaneously, the user should pick a subset of those 
residues to vary at one time. The user picks trial levels 

30 of variegation and calculate the abundances of various 
sequences. The list of varied residues and the level of 
variegation at each varied residue are adjusted until the 
composite variegation is commensurate with the sensitivity 

- - of the affinity separation and the number of independent 
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trans formants that can be made. 

Preferably, the abundance of PPBD-encoding DNA is 3 
to 10 times higher than both 1/M ntv and VCsensi to 
provide a margin of redundancy. Mjitv ^ s the nuint>er of 
trans formants that can be made from Y D100 DNA. With 
current technology Mntv is approximately 5-10 8 , but the 
exact value depends on the details of the procedures 
adapted by the user. Improvements in technology that 
allow more efficient: a) synthesis of DNA, b) ligation of 
DNA, or c) transformation of cells will raise the value of 
M ntv . C sens i is the sensitivity of the affinity separa- 
tion; improvements in affinity separation will raise 
c sensi- If the smaller of M ntv and Csensi is increased, 
higher levels of variegation may be used. For example, if 
c sensi is 1 in 1q9 and M ntv is 108 ' then improvements in 
c sensi are less valuable than improvements in M ntv . 

While variegation normally will involve the substitu- 
tion of one amino acid for another at a designated 
variable codon, it may involve the insertion or deletion 
of amino acids as well. 

III.B. Identification of Residues to be Varied 

We now consider the principles that guide our choice 
of residues of ■ the IPBD to vary. A key concept is that 
only structured proteins exhibit specific binding, i.e. 
can bind to a particular chemical entity to the exclusion 
of most others. Thus the residues to be varied are chosen 
with an eye to preserving the underlying IPBD structure. 
Substitutions that prevent the PBD from folding will cause 
GPs carrying those genes to bind indiscriminately so that 
they can easily be removed from the population. 

Sauer and colleagues (PAKU86, REID88) , and Caruthers 
and colleagues (EISE85) have shown that some residues on 
the polypeptide chain are more important than others in 
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determining the 3D structure of a protein. The 3D 
structure is essentially unaffected by the identity of the 
amino acids at some loci; at other loci only one or a few 
types of amino acid is allowed. In most cases, loci where 
wide variety is allowed have the amino acid side group 
directed toward the solvent. Loci where limited variety 
is allowed frequently have the side group directed toward 
other parts of the protein. Thus substitutions of amino 
acids that are exposed to solvent are less likely to 
affect the 3D structure than are substitutions at internal 
loci. (See also SCHU79, pl69-171 and CREI84, p239-245, 
314-315) . 

The residues that join helices to helices, helices to 
sheets, and sheets to sheets are called turns and loops 
and have been classified by Richardson (RICH81) , Thornton 
(THOR88), Sutcliffe et aJU (SUTC87a) and others. Inser- 
tions and deletions are more readily tolerated in loops 
than elsewhere. Thornton et al. (TH0R88) have summarized 
many observations indicating that related proteins usually 
differ most at the loops which join the more regular 
elements of secondary structure. (These observations are 
relevant not only to the variegation of potential binding 
domains but also to the insertion of binding domains into 
an outer surface protein of a genetic package, as dis- 
cussed in a later section.) 

Burial of hydrophobic surfaces so that bulk water is 
excluded is one of the strongest forces driving the 
binding of proteins to other molecules. Bulk water can be 
excluded from the region between two molecules only if the 
surfaces are complementary. We should test as many 
surface variations as possible to find one that is 
complementary to the target. The selection-through- 
binding isolates those proteins that are more nearly 
complementary to some surface on the target. 
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Proteins do not have distinct, countable faces. 
Therefore we define an "interaction set" to be a set of 
residues such that all members of the set can simul- 
taneously touch one molecule of the target material 
5 without any atom of the target coming closer than van der 
Waals distance to any main-chain atom of the IPBD. The 
concept of a residue "touching" a molecule of the target 
is discussed below. From a picture of BPTI (such as 
Figure 6-10, p. 225 of CREI84) we can see that residues 3, 
10 7, 8, 10, 13, 39, 41, and 42 can all simultaneously 
- contact a molecule the size and shape of myoglobin. We 

also see that residue 49 can not touch a single myoglobin 
hf! molecule simultaneously with any of the first set even 

m though all are on the surface of BPTI. (It is not the 

J;l is intent of the present invention, however, to suggest that 

use of models is required to determine which part of the 

m 

J! target molecule will actually be the site of binding by 

m pbd.) 

Variations in the position, orientation and nature of 
fli 20 the side chains of the residues of the interaction set 

ill will alter the shape of the potential binding surface 

% defined by that set. Any individual combination of such 

lI variations may result in a surface shape which is a better 

or a worse fit for the target surface. The effective 
2 5 diversity of a variegated population is measured by the 
number of distinct shapes the potentially complementary 
surfaces of the PBD can adopt, rather than the number of 
protein sequences. Thus, it is preferable to maximize the 
former number, when our knowledge of the IPBD permits us 
30 to do so. 

To maximize the number of surface shapes generated 
for when N residues are varied, all residues varied in a 
given round of variegation should be in the same interac- 
tion set because variation of several residues in one 
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interaction set generates an exponential number of 
different shapes of the potential binding surface . 

If cassette mutagenesis is to be used to introduce 
the variegated DNA into the ipbd gene, the protein 
5 residues to be varied are, preferably, close enough 
together in sequence that the variegated DNA (vgDNA) 
encoding all of them can be made in one piece . The 
present invention is not limited to a particular length of 
vgDNA that can be synthesized. With current technology, a 
10 stretch of 60 amino acids (180 DNA bases) can be spanned. 

- - - Further, when there is reason to mutate residues 
q further than sixty residues apart, one can use other 

€1 mutational means, such as single-stranded-oligonucleotide- 

*M directed mutagenesis (B0TS85) using two or more mutating 

h-i 15 primers* 

Alternatively, to vary residues separated loy more 
^ than sixty residues, two cassettes may be mutated as 

s follows: 1) vg DNA having a low level of variegation (for 

Q example, 20 to 400 fold variegation) is introduced into 

|j 20 one cassette in the OCV, 2) cells are transformed and cul- 

ifi tured, 3) vg OCV DNA is obtained, 4) a second segment of 

0 " vgDNA is "inserted into a second cassette in the OCV, 

^ and5) cells are transformed and cultured, GPs are har- 

vested and subjected to selection-through-binding. 

25 The composite level of variation preferably does not 

exceed the prevailing capabilities to a) produce very 
large numbers of independently transformed cells or b) 
detect small components in a highly varied population. 
The limits on the level of variegation are discussed 

30 later. 

Data about the IPBD and the target that are useful in 
deciding which residues to vary in the variegation cycle 
include: 1) 3D structure, or at least a list of residues 
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on the surface of the IPBD, 2) list of sequences homolog- 
ous to IPBD, and 3) model of the target molecule or a 
stand-in for the target. 

These data and an understanding of the behavior of 
5 different amino acids in proteins will be used to answer 
two questions: 

1) which residues of the IPBD are on the outside and 
close enough together in space to touch the target 
s imul t aneous ly ? 

10 2) which residues of the IPBD can be varied with high 
probability of retaining the underlying IPBD struc- 
ture? 

Although an atomic model of the target material 
(obtained through X-ray crystallography, NMR, or other 

15 means) is preferred in such examination, it is not 
necessary. For example, if the target were a protein of 
unknown 3D structure, it would be sufficient to know the 
molecular weight of the protein and whether it were a 
soluble globular protein, a fibrous protein, or a membrane 

20 protein. Physical measurements, such as low-angle neutron 
diffraction, can determine the overall molecular shape, 
viz , -the- ratios of the principal moments of inertia. One 
can then choose a protein of known structure of the same 
class and similar size and shape to use as a molecular 

25 stand-in and yardstick. It is not essential to measure the 
moments of inertia of the target because, at low resolu- 
tion, all proteins of a given size and class look much the 
same. The specific volumes are the same, all are more or 
less spherical and therefore all proteins of the same size 

30 and class have about the same radius of curvature. The 
radii of curvature of the two molecules determine how much 
of the two molecules can come into contact. 

The most appropriate method of picking the residues 
of the protein chain at whichthe amino acids should be 
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varied is by viewing, with interactive computer graphics, 
a model of the IPBD. A stick-figure representation of 
molecules is preferred. A suitable set of hardware is an 
Evans & Sutherland PS390 graphics terminal (Evans & 
5 Sutherland Corporation, Salt Lake City, UT) and a MicroVAX 
II supermicro computer (Digital Equipment Corp., Maynard, 
MA) . The computer should, preferably, have at least 150 
megabytes of disk storage, so that the Brookhaven Protein 
Data Bank can be kept on line. A FORTRAN compiler, or 

10 some equally good higher-level language processor is 
preferred for program development. Suitable programs for 
viewing and manipulating protein models include: a) PS- 
FRODO, written by T. A, Jones (J0NE85) and distributed by 
the Biochemistry Department of Rice University, Houston, 

15 TX; and b) PROTEUS, developed by Dayringer, Tramantano, 
and Fletterick (DAYR86) . Important features of PS-FRODO 
and PROTEUS that are needed to view and manipulate protein 
models for the purposes of the present invention are the 
abilities to: 1) display molecular stick figures of 

20 proteins and other molecules, 2) zoom and clip images in 
real time, 3) prepare various abstract representations of 
the molecules, such as a line joining C a s and side group 
atoms, 4) "compute and "display solvent-accessible surfaces 
reasonably quickly, 5) point to and identify atoms, and 6) 

25 measure distance between atoms. 

In addition, one could use theoretical calculations, 
such as dynamic simulations of proteins, to estimate 
whether a substitution at a particular residue of a 
particular amino-acid type might produce a protein of 

3 0 approximately the same 3D structure as the parent protein. 
Such calculations might also indicate whether a particular 
substitution will greatly affect the flexibility of the 
protein; calculations of this sort may be useful but are 
not required. 
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Residues whose mutagenesis is most likely to affect 
binding to a target molecule, without destabilizing the 
protein, are called the "principal set". Using the 
knowledge of which residues are on the surface of the IPBD 
(as noted above) , we pick residues that are close enough 
together on the surface of the IPBD to touch a molecule of 
the target simultaneously without having any IPBD main- 
chain atom come closer than van der Waals distance (viz * 
4.0 to 5.0 A) from any target atom. For the purposes of 
the present invention, a residue of the IPBD "touches" the 
target if: a) a main-chain atom is within van der Waals 
distance, viz. 4.0 to 5.0 A "of any atom of the target 
molecule, or b) the is within D cuto ff of an Y ato:n of 
the target molecule so that a side-group atom could make 
contact with that atom. 

Because side groups differ in size ( cf . Table 35) , 
some judgment is required in picking D cuto ff* Jri the 
preferred embodiment, we will use D cutoff = 8.0 A, but 
other values in the range 6.0 A to 10.0 A could be used. 
If IPBD has G at a residue, we construct a pseudo with 
the correct bond distance and angles and judge the 
ability of the residue to touch the target from this 
pseudo C^g. 

Alternatively, we choose a set of residues on the 
surface of the IPBD such that the curvature of the surface 
defined by the residues in the set is not so great that it 
would prevent contact between all residues in the set and 
a molecule of the target. This method is appropriate if 
the target is a macromolecule, such as a protein, because 
the PBDs derived from the IPBD will contact only a part of 
the macromolecular surface. The surfaces of macromole- 
cules are irregular with varying curvatures. If we pick 
residues that define a surface that is not too convex, 
then there will be a region on a macromolecular target 
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with a compatible curvature* 

In addition to the geometrical criteria, we prefer 
that there be some indication that the underlying IPBD 
structure will tolerate substitutions at each residue in 
the principal set of residues. Indications could come 
from various sources, including: a) homologous sequences, 
b) static computer modeling, or c) dynamic computer 
simulations. 

The residues in the principal set need not be 
contiguous in the protein sequence and usually are not. 
The exposed surfaces of the . residues_ to be varied do not 
need to be connected. We desire only that the amino 
acids in the residues to be varied all be capable of 
touching a molecule of the target material simultaneously 
without having atoms overlap. If the target were, for 
example, horse heart myoglobin, and if the IPBD were BPTI, 
any set of residues in one interaction set of BPTI defined 
in Table 34 could be picked. 

The secondary set comprises those residues not in the 
primary set that touch residues in the primary set. These 
residues might be excluded from the primary set because: 
a) the residue is internal/ b) the residue is highly 
conserved, or c) the residue is on the surface, but the 
curvature of the IPBD surface prevents the residue from 
being in contact with the target at the same time as one 
or more residues in the primary set* 

Internal residues are frequently conserved and the 
amino acid type can not be changed to a significantly 
different type without substantial risk that the protein 
structure will be disrupted. Nevertheless, some conserva- 
tive changes of internal residues, such as I to L or F to 
y, are tolerated. Such conservative changes subtly affect 
the placement and dynamics of adjacent protein residues 
and such "fine tuning" may be useful once an SBD is „found. 
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Surface residues in the secondary set are most often 
located on the periphery of the principal set. Such 
peripheral residues can not make direct contact with the 
target simultaneously with all the other residues of the 
5 principal set. The charge on the amino acid in one of 
these residues could, however, have a strong effect on 
binding. Once an SBD is found, it is appropriate to vary 
the charge of some or all of these residues. For example, 
the variegated codon containing equimolar A and G at base 
10 1, equimolar C and A at base 2, and A at base 3 yields 
amino acids T, A, K, and -E with equal probability. 

The assignment of residues "to the primary and 
secondary sets may be based on: a) geometry of the IPBD 
and the geometrical relationship between the IPBD and the 

15 target (or a stand-in for the target) in a hypothetical 
complex, and b) sequences of proteins homologous to the 
IPBD. However, it should be noted that the distinction 
between the principal set and the secondary set is one 
more of convenience than of substance; we could just as 

20 easily have assigned each amino acid residue in the domain 
a preference score that weighed together the different 
considerations affecting whether they are suitable for 
variegation, and then ranked the residues in order, from 
most preferred to least. 

25 For any given round of variegation, it may be 

necessary to limit the variegation to a subset of the 
residues in the primary and secondary sets, based on 
geometry and on the maximum allowed level of variegation 
that assures progressivity . The allowed level of variega- 

30 tion determines how many residues can be varied at once; 
geometry determines which ones. 

The user may pick residues to vary in many ways. For 
example, pairs of residues are picked that are diametri- 
cally opposed across the face of the principal set. Two 
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such pairs are used to delimit the surface, up/down and 
right/left. Alternatively, three residues that form an 
inscribed triangle, having as large an area as possible, 
on the surface are picked. One to three other residues 
are picked in a checkerboard fashion across the interac- 
tion surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because all 
the intervening residues must have acceptable complemen- 
tarity before favorable interactions can occur at widely- 
separated residues. 

The number of residues picked is coupled to the range 
through which each can be varied by the restrictions 
discussed below. In the first round, we do not assume any 
binding between IPBD and the target and so progressivity 
is not an issue. At the first round, the user may elect 
to produce a level of variegation such that each molecule 
of vgDNA is potentially different through, for example, 
unlimited variegation of 10 codons (20 10 approx. = 10 13 ) . 
One run of the DNA synthesizer produces approximately 10 13 
molecules of length 100 nts. Inefficiencies in ligation 
and transformation will reduce the number of proteins 
actually tested to between 10 7 and 5«10 8 . Multiple 
replications of the process with such very high levels of 
variegation will not yield repeatable results; the user 
decides whether this is important. 

III.C. Determining the Substitution Set for Each Parental 
Residue 

Having picked which residues to vary, we now decide 
the range of amino acids to allow at each variable 
residue. The total level of variegation is the product of 
the number of variants at each varied residue. Each 
varied residue can have a different scheme of variegation, 
producing 2 to 20 different possibilities. The set of 
amino acids which are potentially encoded by a given 
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variegated codon are called its "substitution set". 

The computer that controls a DNA synthesizer, such as 
the Milligen 7500, can be programmed to synthesize any 
base of an oligo-nt with any distribution of nts by talcing 
some nt substrates ( e.g. nt phosphoramidites) from each of 
two or more reservoirs. Alternatively, nt substrates can 
be mixed in any ratios and placed in one of the extra 
reservoir for so called "dirty bottle" synthesis.. Each 
codon could be programmed differently. The "mix" of bases 
at each nucleotide position of the codon determines the 
relative frequency of occurrence of the different amino 
acids encoded by that codon. 

Simply variegated codons are those in which those 
nucleotide positions which are degenerate are obtained 
from a mixture of two or more bases mixed in equimolar 
proportions. These mixtures are described in this 
specification by means of the standardized "ambiguous 
nucleotide" code (Table 1 and 37 CFR §1.822). In this 
code, for example, in the degenerate codon "SNT", "S" 
denotes an equimolar mixture of bases G and C, "N", an 
equimolar mixture of all four bases, and "T", the single 
invariant base thymidine . 

Complexly variegated codons are those in which at 
least one of the three positions is filled by a base from 
an other than equimolar mixture of two of more bases. 

Either simply or complexly variegated codons may be 
used to achieve the desired substitution set. 

If we have no information indicating that a parti- 
cular amino acid or class of amino acid is appropriate, we 
strive to substitute all amino acids with equal probabi- 
lity because representation of one mini-protein above the 
detectable level is wasteful. Equal amounts of all four 
nts at each position in a codon (NNN) yields the amino 



acid distribution in which each amino acid is present in 
proportion to the nuinber of codons that code for it. This 
distribution has the disadvantage of giving two basic 
residues for every acidic residue. In addition, six times 
5 as much R, S, and L as W or M occur. If five codons are 
synthesized with this distribution, each of the 243 
sequences encoding some combination of L, R, and S are 
7776-times more abundant than each of the 32 sequences 
encoding some combination of W and M. To have five Ws 
10 present at detectable levels, we must have each of the 
(L,R,S) sequences present in 7776-fold excess.- 

Preferably, we also consider the interactions between 

O the sites of variegation and the surrounding DNA. If the 

■5? method of mutagenesis to be used is replacement of a 

J2 15 cassette, we consider whether the variegation will 

W generate gratuitous restriction sites and whether they 

!rt seriously interfere with the intended introduction of 

ff* diversity. We reduce or eliminate gratuitous restriction 

s sites by appropriate choice of variegation pattern and 

5? 20 silent alteration of codons neighboring the sites of 

ft) 

fll variegation. 

"0 It is generally accepted that the sequence of amino 

7\ acids in a protein or polypeptide determine the three- 

dimensional structure of the molecule, including the 
25 possibility of no definite structure. Among polypeptides 
of definite length and sequence, some have a defined 
tertiary structure and most do not. 

Particular amino acid residues can influence the 
tertiary structure of a defined polypeptide in several 
30 ways, including by: 

a) affecting the flexibility of the polypeptide main 
chain, 

b) adding hydrophobic groups, 

c) adding charged groups, 
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d) allowing hydrogen bonds, and 

e) forming cross-links,- such as disulfides, chelation to 
metal ions, or bonding to prosthetic groups. 

Most works on proteins classify the twenty amino acids 
5 into categories such as hydrophobic/hydrophilic, positive- 
/negative/neutral, or large/ small. These classifications 
are useful rules of thumb, but one must be careful not to 
oversimplify. Proteins contain a variety of identifiable 
secondary structural features, including: a) a helices, b) 

10 3-10 helices, c) anti-parallel j9 sheets, d) parallel p 
sheets, e) f] loops, f) reverse turns, and g) various cross 
links. Many people have analyzed proteins of known 
structures and assigned each amino-acid to one category or 
another. Using the frequency at which particular amino 

15 acids occur in various types of secondary structures, 
people have a) tried to predict the secondary structures 
of proteins for which only the amino-acid sequence is 
known (CHOU74, CH0U78a, CH0U78b) , and b) designed proteins 
de novo that have a particular set of secondary structural 

20 elements (DEGR87, HECH90) . Although some amino acids show 
definite predilection for one secondary form ( e.g. VAL for 
/3 structure and ALA for a helices) , these preferences are 
not very strong; Creighton has tabulated the preferences 
(CREI84) . In only seven cases does the tendency exceed 

25 2.0: 



Amino acid distinction ratio 

MET a/ turn 3.7 

PRO turn/ a 3.7 

VAL 0/turn 3.2 

3 0 GLY turn/ a 2.9 

ILE £/turn 2 « 8 

PHE 0/turn 2 . 3 

LEU a/ turn 2 . 2 



35 

Every amino-acid type has been observed in every iden- 
tified secondary structural motif. ARG is particularly 
indiscriminate . 
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PRO is generally taken to be a helix breaker. 
Nevertheless, proline often occurs at the beginning of 
helices or even in the middle of a helix, where it 
introduces a slight bend in the helix. Matthews and 
coworkers replaced a PRO that occurs near the middle of an 
a helix in T4 lysozyme. To their surprise, the "improved" 
protein is less stable than the wild-type. The rest of 
the structure had been adapted to fit the bent helix. 

Lundeen (LUND86) has tabulated the frequencies of 
amino acids in helices, p strands, turns, and coil in 
proteins of known 3D structure and has distinguished 
between CYSs having free thiol groups and half cystines. 
He reports that free CYS is found most often in helixes 
while half cystines are found more often in p sheets. 
Half cystines are, however, regularly found in helices. 
Pease et al. (PEAS90) constructed a peptide having two 
cystines; one end of each is in a very stable a helix. 
Apamin has a similar structure (WEMM83, PEAS88) . 

Flexibility: 

GLY is the smallest amino acid, having two hydrogens 
attached to the C a . Because GLY has no C^, it confers 
the most flexibility on the main chain. Thus GLY occurs 
very frequently in reverse turns, particularly in conjunc- 
tion with PRO, ASP, ASN, SER, and THR. 

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, 
PHE, TYR, TRP, ARG, HIS, GLU, GLN, and LYS have unbranched 
p carbons. Of these, the side groups of SER, ASP, and ASN 
frequently make hydrogen bonds to the main chain and so 
can take on main-chain conformations that are energetic- 
ally unfavorable for the others. VAL, ILE, and THR have 
branched p carbons which makes the extended main-chain 
conformation more favorable. Thus VAL and ILE are most 
often seen in p sheets. Because the side group of THR can 
easily form hydrogen bonds to the main chain, it has less 
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tendency to exist in a 0 sheet. 

The main chain of proline is particularly constrained 
by the cyclic side group- The 0 angle is always close to 
-60°. Most prolines are found near the surface of the 
5 protein. 

Charge : 

LYS and ARG carry a single positive charge at any pH 
below 10.4 or 12.0, respectively. Nevertheless, the 
methylene groups, four and three respectively, of these 
10 amino acids are capable of hydrophobic interactions. The 
guanidinium group of ARG is capable of donating five 
^ hydrogens simultaneously, while the amino group of LYS can 

J donate only three. Furthermore, the geometries of these 

IS groups is quite different, so that these groups are often 

^ 15 not interchangeable. 

£B ASP and GLU carry a single negative charge at any pH 

% i above «4.5 and 4.6, respectively. Because ASP has but one 

m 

^ methylene group, few hydrophobic interactions are pos- 

p sible. The geometry of ASP lends itself to forming 

u] 20 hydrogen bonds to main-chain nitrogens which is consistent 

^ with ASP being found very often in reverse turns and at 

2 the beginning of helices. GLU is more often found in a 

helices and particularly in the amino-terminal portion of 
these helices because the negative charge of the side 
25 group has a stabilizing interaction with the helix dipole 
(NICH88, SALI88) . 

HIS has an ionization pK in the physiological range, 
viz, 6.2. This pK can be altered by the proximity of 
charged groups or of hydrogen donators or acceptors. HIS 
3 0 is capable of forming bonds to metal ions such as zinc, 
copper, and iron. 

H ydrogen bonds: 

Aside from the charged amino acids, SER, THR, ASN, 
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GLN, TYR, and TRP can participate in hydrogen bonds. 
Cross links: 

The most important form of cross link is the disul- 
fide bond formed between two thiols, especially the 
5 thiols of CYS residues. In a suitably oxidizing environ- 
ment, these bonds form spontaneously. These bonds can 
greatly stabilize a particular conformation of a protein 
or mini-protein. When a mixture of oxidized and .reduced 
thiol reagents are present, exchange reactions take place 
10 that allow the most stable conformation to predominate. 
Concerning disulfides in proteins and peptides, see also 
KATZ90, MATS89, PERR84, PERR86, SAUE86, WELL86, JANA89, 
H0RV89, KISH85, and SCHN86. 

Other cross links that form without need of specific 
15 enzymes include: 

Rubredoxin (in CREI84, P. 376) 
Aspartate Transcarbamylase (in 
CREI84, P. 376) and Zn-fingers 
(HARD90) 

Azurin (in CREI84, P. 376) and 
Basic "Blue" Cu Cucumber 
protein (GUSS88) 
CuZn superoxide dismutase 
Ferredoxin (in CREI84, P. 376) 
Z inc- fingers ( GIBS 8 8 ) 
Zinc-fingers (GAUS87, GIBS88) 

Cross links having (HIS) 2 (MET) (CYS) :Cu has the potential 
advantage that HIS and MET can not form other cross links 
without Cu. 

30 Simply Variegated Codons 

The following simply variegated codons are useful 
because they encode a relatively balanced set of amino 
acids: 



1; (CYS) 4 :Fe 
2) (CYS) 4 :Zn 



20 3) (HIS) 2 (MET) (CYS) : Cu 



4) (HIS) 4 :Cu 

5) (CYS) 4 : (Fe 4 S 4 ) 
25 6) (CYS) 2 (HIS) 2 :2n 

7) (CYS) 3 (HIS) :Zn 



1) SNT which encodes the set [L, P, H,R,V,A,D,G] : a) one 
acidic (D) and one basic (R) , b) both aliphatic 
(L,V) and aromatic hydrophobics (H) , c) large 
(L,R,H) and small (G,A) side groups, d) ridged (P) 

5 and flexible (G) amino acids, e) each amino acid 

encoded once. 

2) RNG which encodes the set [M,T,K,R, V,A,E, G] : a) one 
acidic and two basic (not optimal, but acceptable), 
b) hydrophilics and hydrophobics, c) each amino acid 

10 encoded once* 

3) RMG which encodes the set [T, K, A,E] : a) one acidic, 
one basic, one neutral hydrophilic, b) three favor a 
helices, c) each amino acid encoded once. 

4) VNT which encodes the set [L, P, H, R, I, T,N,S,V, A, D,G] : 
15 a) one acidic, one basic, b) all classes: charged, 

neutral hydrophilic, hydrophobic, ridged and flex- 
ible, etc. , c) each smino acid encoded once. 
5; RRS which encodes the set [N,S,K,R,D,E,G 2 ] s a) two 
acidics, two basics, b) two neutral hydrophilics, c) 

20 only glycine encoded twice. 

6) NNT which encodes the set [F/S^C^P^R^T^VjA- 
,D,G]: a) sixteen DNA sequences provide fifteen 
different amino acids; only serine is repeated, all 
others are present in equal amounts (This allows very 

25 efficient sampling of the library.), b) there are 

equal numbers of acidic and basic amino acids (D and 
R, once each) , c) all major classes of amino acids 
are present: acidic, basic, aliphatic hydrophobic, 
aromatic hydrophobic, and neutral hydrophilic. 

30 7) NNG, which encodes the set [L 2 ,R 2 ,S,W / P,Q,M,T,K,V,A,- 
E,G, stop]: a) fair preponderance of residues that 
favor formation of a-helices [L,M, A,Q,K,E; and, to a 
lesser extent, S,R,T]; b) encodes 13 different amino 
acids. (VHG encodes a subset of the set encoded by 

35 NNG which encodes 9 amino acids in nine different DNA 
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sequences, with equal acids and bases, and 5/9 being 
a helix-favoring.) 

For the initial variegation, NNT is preferred, in 
most cases. However, when the codon is encoding an amino 
5 acid to be incorporated into an a helix, NNG is preferred. 

Below, we analyze several simple variegations as to 
the efficiency with which the libraries can be sampled. 

Libraries of random hexapeptides encoded by (NNK) 6 
have been reported (SCOT90, CWIR90) . Table 130 shows the 

10 expected behavior of such libraries. NNK produces single 
codons for PHE, TYR, CYS, TRP, HIS, GLN, ILE, MET, ASN, 
LYS, ASP, and GLU (a set) ; two codons for each of VAL, 
ALA, PRO, THR, and GLY (§ set) ; and three codons for each 
of LEU, ARG, and SER (ft set) . We have separated the 

15 64,000,000 possible sequences into 28 classes, shown in 
Table 13 OA, based on the number of amino acids from each 
of these sets. The largest class is iClaaact with -14 . 6% cf 
the possible sequences. Aside from any selection, all the 
sequences in one class have the same probability of being 

20 produced. Table 13 0B shows the probability that a given 
DNA sequence taken from the (NNK) 6 library will encode a 
hexapeptide belonging to one of the defined classes; note 
that only «6.3% of DNA sequences belong to the $ftaaaa 
class. 

25 Table 13 0C shows the expected numbers of sequences in 

each class for libraries containing various numbers of 
independent transf ormants ( viz. 10 6 , 3^10 6 , 10 7 , 3-10 7 , 
10 s , 3*10 8 , 10 9 , and 3*10 9 ). At 10 6 independent transf or- 
mants (ITs), we expect to see 56% of the ftftftftftft class, but 

30 only 0,1% of the aaaaaa class. The vast majority of 
sequences seen come from classes for which less than 10% 
of the class is sampled. Suppose a peptide from, for 
example, class $$ano:a is isolated by fractionating the 
library for binding to a target. Consider how much we 
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know about peptides that are related to the isolated 
sequence. Because only 4% of the §§nna<x class was 
sampled, we can not conclude that the amino acids from the 
n set are in fact the best from the n set. We might have 
5 LEU at position 2, but ARG or SER could be better. Even 
if we isolate a peptide of the nnnfKin class, there is a 
noticeable chance that better members of the class were 
not present in the library. 

With a library of 10 7 ITs, we see that several 
10 classes have been completely sampled, but that the aaaaaa 
class is only 1.1% sampled. At 7.6«10 7 ITs, we expect 
display of 50% of all amino-acid sequences, but the 
classes containing three or more amino acids of the a set 
are still poorly sampled. To achieve complete sampling of 
15 the (NNK) 6 library requires about 3-10 9 ITs, 10-fold 
larger than the largest (NNK) 6 library so far reported. 

Tabic 3 31 shows expectations for a library encoded by 
(NNT) 4 (NNG) 2 . The expectations of abundance are indepen- 
dent of the order of the codons or of interspersed 

20 unvaried codons. This library encodes 0.133 times as many 
amino-acid sequences, but there are only 0.0165 times as 
many DNA sequences. Thus 5.0*10 7 ITs ( i.e. 60-fold fewer 
"than required for (NNK) 6 ) gives almost complete sampling 
of the library. The results would be slightly better for 

25 (NNT) 6 and slightly, but not much, worse for (NNG) 6 . The 
controlling factor is the ratio of DNA sequences to amino- 
acid sequences. 

Table 132 shows the ratio of #DNA sequences/#AA 
sequences for codons NNK, NNT, and NNG. For NNK and NNG, 
we have assumed that the PBD is displayed as part of an 
essential gene, such as gene III in Ff phage, as is 
indicated by the phrase "assuming stops vanish". It is 
not in any way required that such an essential gene be 
used. If a non-essential gene is used, the analysis would 



82 

be slightly different; sampling of NNK and NNG would be 
slightly less efficient.. Note that (NNT) 6 gives 3.6-fold 
more amino-acid sequences than (NNK) 5 but requires 1.7- 
fold fewer DNA sequences. Note also that (NNT) 7 gives 
twice as many amino-acid sequences as (NNK) 6 , but 3.3-fold 
fewer DNA sequences. 

Thus, while it is possible to use a simple mixture 
(NNS, NNK or NNN) to obtain at a particular position all 
twenty amino acids, these simple mixtures lead to a highly 
biased set of encoded amino acids. This problem can be 
overcome by use of complexly variegated codons. 

Complexly Variegated Codons 

Let Abun(x) be the abundance of DNA sequences coding 
for amino acid x, defined by the distribution of nts at 
each base of the codon. For any distribution, there will 
be a most-favored amino acid Os^a) with abundance 
Abun(mfaa) and a least~f* -/ored amino acid (Ifaa) with 
abundance *bun(lfaa) . We seek the nt distribution that 
allows all twenty amino acids and that yields the largest 
ratio Abun(lfaa)/Abun(mfaa) subject, if desirable to 
further constraints. 

We first will present the mixture calculated to be 
optimal when the nt distribution is subject to two 
constraints: equal abundances of acidic and basic amino 
acids and the least possible number of stop codons. Thus 
only nt distributions that yield Abun(E)+Abun(D) = 
Abun(R)+Abun(K) are considered, and the function maximized 
is: 

{ (l-Abun(stop) ) (Abun(lfaa)/Abun(mfaa) ) } . 

We have simplified the search for an optimal nt distribu- 
tion by limiting the third base to T or G (C or G is 
equivalent) . All amino acids are possible and the number 
of accessible stop codons is reduced because TGA and TAA 
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codons are eliminated. The amino acids F, Y, C, H, N, I, 
and D require T at the third base while W, M, Q, K, and E 
require G. Thus we use an equimolar mixture of T and G at 
the third base. However , it should be noted that the 
present invention embraces use of complexly variegated 
codons in which the third base is not limited to T or G 
(or to C or G) . 

A computer program, written as part of the present 
invention and named "Find Optimum vgCodon" (See Table 9) , 
varies the composition at bases 1 and 2, in steps of 0.05, 
and reports the composition that gives the largest value 
of the quantity { (Abun(lfaa) /Abun(mfaa) (l-Abun(stop) ) ) } . 
A vg codon is symbolically defined by the nucleotide 
distribution at each base: 







T 






c 


A 




G 


base 


#1 - 


tl 






cl 


al 




gi 


bdse 


#2 - 


t2 






c2 


a2 






base 


#3 = 


t3 






c3 


a3 




g3 






tl 


+ 


cl 


+ al + 


gl = 


1.0 








t2 




c2 


+ a2 + 


g2 = 


1.0 








t3 




g3 


=* 0.5, 


C3 = 


: a3 


= C 



The variation- of the quantities tl, cl, al, gl, t2, c2, 
a2, and g2 is subject to the constraint that: 

Abun(E)+Abun(D) = Abun(K)+Abun(R) 
Abun(E)+Abun(D) = gl*a2 

Abun(K)+Abun(R) = al*a2/2 + cl*g2 + al*g2/2 
gl*a2 = al*a2/2 + ci*g2 + al*g2/2 

Solving for g2, we obtain 

g2 * (gl*a2 - 0.5*al*a2)/(cl + 0.5*al) 

In addition, 

tl = 1 - al - cl - gl 
t2 - 1 - a2 - c2 - g2 
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We vary al, cl, gl, a2, and c2 and then calculate tl, g2, 
and t2. Initially, variation is in steps of 5%. Once an 
approximately optimum distribution of nucleotides is 
determined, the region is further explored with steps of 
5 1%. The logic of this program is shown in Table 9. The 
optimum distribution (the "fxS" codon) is shown in Table 
10A and yields DNA molecules encoding each type amino acid 
with the abundances shown . 

Note that this chemistry encodes all twenty amino 

10 acids, with acidic and basic amino acids being equi- 
probable, and the most favored amino acid (serine) is 
encoded only 2.454 times as often as the least favored 
amino acid (tryptophan) . The "fxS" vg codon improves 
sampling most for peptides containing several of the amino 

15 acids [F,Y,C,W,H,Q,I,M,N,K,D,E] for which NNK or NNS 
provide only one codon. Its Sampling advantages are B^st 
pronounced .hep thz library is relatively small, 

A modification of "Fine Optimum vgCodon" varies the 
composition at bases 1 and 2, in steps of 0.01, and 

20 reports the composition that gives the largest value of 
the quantity { (Abun ( If aa) /Abun (mf aa) ) } without any 
- restraint on the relative abundance of any amino acids. 
The results of this optimization is shown in Table 10B. 
The changes are small, indicating that insisting on 

25 equality of acids and bases and minimizing stop codons 
costs us little. Also note that, without restraining the 
optimization, the prevalence of acidic and basic amino 
acids comes out fairly close. On the other hand, relaxing 
the restriction leaves a distribution in which the least 

30 favored amino acid is only .412 times as prevalent as SER. 

The advantages of an NNT codon are discussed else- 
where in the present application. Unoptimized NNT 
provides 15 amino acids encoded by only 16 DNA sequences. 
It is possible to improve on NNT as follows. First note 
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that the SER codons occur in the T and A rows of the 
genetic-code table and in the C and G columns. 

[SER] = T x x C 2 + A x X G 2 

If we reduce the prevalence of SER by reducing T lt C 2 , h lr 
5 and G 2 relative to other bases, then we will also reduce 
the prevalence of PHE, TYR, CYS, PRO, THR, ALA, ARG, GLY, 
ILE, and ASN. The prevalence of LEU, HIS, VAL, and ASP 
will rise. If we assume that T lt C 2 , A x , and G 2 are all 
lowered to the same extent and that C lf G x , T 2 , and A 2 are 
. 10 increased by the same amount, we can compute a shift that 
makes the prevalence of SER equal the prevalences of LEU, 
HIS, VAL, and ASP. The decrease in PHE, TYR, CYS, PRO, 
THR, ALA, ARG, GLY, ILE, and ASN is not equal; CYS and THR 
are reduced more than the others. 

15 L^t the distribution be 

T C A G 

base #1 = .25-q .25+q .25-q .25+q 
ba^a tr2 = .25+q .25~q .25+q .25-q 
base #3 = 1.00 0.0 0.0 0.0 

20 Setting [SER] = [LEU] = [HIS] = [VAL] = [ASP] gives: 
(.25-q) • (.25-q) + ( • 25-q) • ( • 25-q) * (. 25+q) •(. 25+q) 
2- (.25-q) 2 = (.25+q) 2 
q 2 -1.5 q + .0625 = 0 
q = (3/4) - 72/2 = .0428 

25 This distribution (shown in Table IOC) gives five 

amino acids (SER, LEU, HIS, VAL, ASP) in very nearly equal 
amounts. A further eight amino acids (PHE, TYR, ILE, ASN, 
PRO, ALA, ARG, GLY) are present at 78% the abundance of 
SER. THR and CYS remain at half the abundance of SER. 

30 When variegating DNA for disul fide-bonded mini-proteins, 
it is often desirable to reduce the prevalence of CYS. 
This distribution allows 13 amino acids to be seen at high 
level and gives no stops; the optimized fxS distribution 
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allows only 11 amino acids at high prevalence. 

The NNG codon can also be optimized. Table 10D shows 
an approximately optimized NNG codon. When equimolar 
T,C,A,G are used in NNG, one obtains double doses of LEU 
5 and ARG. To improve the distribution, we increase G x by 
45, decrease T ± and A x by 8 each and C x by 28. We adopt 
this pattern because C x affects both LEU and ARG while T x 
and A x each affect either LEU or ARG, but not both. 
Similarly, we decrease T 2 and G 2 by r while we increase C 2 
10 and A 2 by r. We adjusted 8 and r until [ALA] « [ARG]. 
There are, under this variegation, four equally most 
favored amino acids: LEU, ARG, ALA, and GLU. Note that 
% there is one acidic and one basic amino acid in this set. 

fj@ There are two equally least favored amino acids: TRP and 

w 15 MET. The ratio of lfaa/mfaa is 0.5258. If this codnn is 

2 repeated six times, peptides composed entirely of to? and 
5 Mf^ arc 2% wesson ap peptides composed antireiy of the 

03 most favored amino acids. We refer to this as "the 
^ prevalence of (TRP/MET) 6 in optimized NNG 6 vgDNA** « 

01 20 When synthesizing vgDNA by the "dirty bottle" method, 

it is sometimes desirable to use only a limited number of 
X mixes,- One very useful mixture is called the "optimized 

|i NNS mixture" in which we average the first two positions 

of the fxS mixture: T 2 = 0.24, C ± = 0.17, A ± = 0.33, G ± = 
25 0.26, the second position is identical to the first, C 3 = 
G 3 = 0.5. This distribution provides the amino acids ARG, 
SER, LEU, GLY, VAL, THR, ASN, and LYS at greater than 5% 
plus ALA, ASP, GLU, ILE, MET, and TYR at greater than 4%. 

An additional complexly variegated codon is of 
30 interest. This codon is identical to the optimized NNT 
codon at the first two positions and has T:G::90:10 at the 
third position. This codon provides thirteen amino acids 
(ALA, ILE, ARG, SER, ASP, LEU, VAL, PHE, ASN, GLY, PRO, 
TYR, and HIS) at more than 5.5%. THR at 4.3% and CYS at 
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3.9% are more common than the LFAAs of NNK (3.125%). The 
remaining five amino acids are present at less than 1%. 
This codon has the feature that all amino acids are 
present; sequences having more than two of the low- 
5 abundance amino acids are rare. When we isolate an SBD 
using this codon, we can be reasonably sure that the first 
13 amino acids were tested at each position. A similar 
codon, based on optimized NNG, could be used. 

Table 10E shows some properties of an unoptimized NNS 
10 (or NNK) codon. Note that there are three equally most- 
favored amino acids: ARG, LEU, and SER. There are also 
twelve equally least favored amino acids: PHE, ILE, MET, 
TYR, HIS, GLN, ASN, LYS , ASP, GLU, CYS , and TRP. Five 
amino acids (PRO, THR, ALA, VAL, GLY) fall in between. 
15 Note that a six-fold repetition of NNS gives sequences 
composed of the amine- acids [PHE, ILE, MET, HIS. GLN, 

LSK. LY£ , A3I\ GUJ, CYS , and IE? J at on\y «o 1% of the 
sequences composed of [ARG, LEU, and SER] . Not only is 
this «20-fold lower than the prevalence of (TRP/MET) 6 in 

2 0 optimized NNG 6 vgDNA , but this low prevalence applies to 

twelve amino acids. 

Diffuse Mutagenesis. _ 

Diffuse Mutagenesis can be applied to any part of the 
protein at any time, but is most appropriate when some 
25 binding to the target has been established. Diffuse 
Mutagenesis can be accomplished by spiking each of the 
pure nts activated for DNA synthesis ( e.g. nt-phosphorami- 
dites) with a small amount of one or more of the other 
activated nts. 

3 0 Contrary to general practice, the present invention 

sets the level of spiking so that only a small percentage 
(1% to .00001%, for example) of the final product will 
contain the initial DNA sequence. This will insure that 
many single, double, triple, and higher mutations occur, 
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but that recovery of the basic sequence will be a possible 
outcome. Let N b be the number of bases to be varied, and 
let Q be the fraction of all sequences that should have 
the parental sequence, then M, the fraction of the mixture 
5 that is the majority component, is 

K = exp{ log e (Q)/N b } = 10 (l°9l0 (Q)/ N b) m 

If, for example, thirty base pairs on the DNA chain were 
to be varied and 1% of the product is to have the parental 
sequence, then each mixed nt substrate should contain 8 6% 

10 of the parental nt and 14% of other nts. Table 8 shows 
the fraction (fn) of DNA molecules having n non-parental 
bases when 3 0 bases are synthesized with reagents that 
contain fraction M of the majority component. When 
M=. 63096, f24 and higher are less than 10~ 8 . The entry 

15 "most" in Table 8 is the number of changes that has the 
highest probability. Note that substantial probability 
for itiu.lt ipie substitutions only occurs it --c Inaction of 
parental sequence (fO) is allowed to drop to around 10" 6 . 
The N b base pairs of the DNA chain that are synthesized 

2u with mixed reagents need not be contiguous. They are 
picked so that between N b /3 and N b codons are affected to 
various degrees.. The residues picked for mutation are 
picked with reference to the 3D structure of the IPBD, if 
known. For example, one might pick all or most of the 

25 residues in the principal and secondary set. We may 
impose restrictions on the extent of variation at each of 
these residues based on homologous sequences or other 
data. The mixture of non-parental nts need not be random, 
rather mixtures can be biased to give particular amino 

30 acid types specific probabilities of appearance at each 
codon. For example, one residue may contain a hydrophobic 
amino acid in all known' homologous sequences; in such a 
case, the first and third base of that codon would be 
varied, but the second would be set to T. Other examples 
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of how this might be done are given in the horse heart 
myoglobin example. This diffuse structure-directed 
mutagenesis will reveal the subtle changes possible in 
protein backbone associated with conservative interior 
5 changes, such as V to I, as well as some not so subtle 
changes that require concomitant changes at two or more 
residues of the protein. 

III.D. Special Considerations Relating to Va riegation of 
Mini-Proteins with Essential Cysteines 

10 Several of the preferred simple or complex variegated 

codons encode a set of amino acids which includes cyste- 
ine. This means that some of the encoded binding domains 
will feature one or more cysteines in addition to the 
invariant disul fide-bonded cysteines. For example, at 

15' each NNT-encoded position, there is a one in sixteen 
chance of obtaining cysteine. If si'x codons are so 
varied, the fraction of domains containing additional 
cysteines is 0.33. Odd numbers of cysteines can lead to 
complications, see Perry and Wetzel ( PERR8 4 ) . On the 

20 other hand, many disulf ide-containing proteins contain 
cysteines that do not form disulfides, e.g. trypsin. The 
possibility of unpaired cysteines- can be dealt with in 
several ways: 

First, the variegated phage population can be passed 
25 over an immobilized reagent that strongly binds free 
thiols, such as SulfoLink (catalogue number 44895 H from 
Pierce Chemical Company, Rockford, Illinois, 61105) . 
Another product from Pierce is TNB-Thiol Agarose (Cata- 
logue Code 20409 H) . BioRad sells Af f i-Gel 401 (catalogue 
30 153-4599) for this purpose. 

Second, one can use a variegation that excludes 
cysteines, such as: 

NHT that gives [F,S,Y,L,P,H,I,T / N,V,A,D], 
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VNS that gives 

[L 2 ,P 2 /H/Q^R^/I/M/T 2 ^N^^^' ,A 2 , E, D,G 2 ] , 
NNG that gives [L 2 ,S,W,P,Q,R 2 ,M,T,K,R, V, A,E,G, stop] , 
SNT that gives [L, P,H, R,V, A, D,G] , 
5 RNG that gives [M,T,K,R, V,A,E,G] , 

RMG that gives [T,K,A,E], 

VNT that gives [L,P,H,R,I,T,N,S,V,A,D,G] , or 
RRS that gives [N,S,K,R,D,E,G 2 ] * 

However, each of these schemes has one or more of the 
10 disadvantages, relative to NNT: a) fewer amino acids are 
allowed, b) amino acids "are not evenly provided, c) acidic 
and basic amino acids are not equally likely) , or d) stop 
codons occur. Nonetheless, NNG, NHT, and VNT are almost 
as useful as NNT. NNG encodes 13 different amino acids 
15 and one stop signal* Only two amino acids appear twice in 
the 16-fold mix. 

Thirdly, one can c::ricn the population for binding to 
the preselected target, and evaluate selected sequences 
post hoc for extra cysteines. Those that contain more 

20 cysteines than the cysteines provided for conformational 
constraint may be perfectly usable. It is possible that 
a disulfide linkage other than _ the designed one will 
occur. This does not mean that the binding domain defined 
by the isolated DNA sequence is in any way unsuitable. 

25 The suitability of the isolated domains is best determined 
by chemical and biochemical evaluation of chemically 
synthesized peptides. 

Lastly, one can block free thiols with reagents, such 
as Ellma^s reagent, iodoacetate, or methyl iodide, that 
30 specifically bind free thiols and that do not react with 
disulfides, and then leave the modified phage in the 
population. It is to be understood that the blocking 
agent may alter the binding properties of the mini- 
protein; thus, one might use a variety of blocking reagent 
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in expectation that different binding domains will be 
found. The variegated population of thiol-blocked genetic 
packages are fractionated for binding. If the DNA 
sequence of the isolated binding mini-protein contains an 
odd number of cysteines, then synthetic means are used to 
prepare mini -proteins having each possible linkage and in 
which the odd thiol is appropriately blocked. Nishiuchi 
(NISH82, NISH86, and works cited therein) disclose methods 
of synthesizing peptides that contain a plurality of 
cysteines so that each thiol is protected with a different 
type of blocking group. These- groups .can be selectively 
removed so that the disulfide pairing -can be controlled. 
We envision using such a scheme with the alteration that 
one thiol either remains blocked, or is unblocked and then 
reblocked with a different reagent. 

III.E Planning the Second and Later Rounds of Varie ga- 
tion - 

The method of the present invention allows ef f \. \ i en**, 
accumulation of information concerning the amino-aci-i 
sequence of a binding domain having high affinity for a 
predetermined target. Although one may obtain a highly 
useful binding domain from a single round of variegation 
and affinity enrichment, we expect that multiple rounds 
will be needed to achieve the highest possible affinity 
and specificity. 

If the first round of variegation results in some 
binding to the target, but the affinity for the target is 
still too low, further improvement may be achieved by 
variegation of the SBDs. Preferably, the process is 
progressive, i.e. each variegation cycle produces a better 
starting point for the next variegation cycle than the 
previous cycle produced. Setting the level of variegation 
such that the ppbd and many sequences related to the ppbd 
sequence are present in detectable amounts ensures that 
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the process is progressive- If the level of variegation 
is so high that the ppbd sequence is present at such low 
levels that there is an appreciable chance that no trans- 
formant will display the PPBD, then the best SBD of the 
next round could be worse than the PPBD. At excessively 
high level of variegation, each round of mutagenesis is 
independent of previous rounds and there is no assurance 
of progressivity. This approach can lead to valuable 
binding proteins, but repetition of experiments with this 
level of variegation will not yield progressive results. 
Excessive variation is not preferred. 

Progressivity is not an all-or-nothing property. So 
long as most of the information obtained from previous 
variegation cycles is retained and many different surfaces 
that are related to the PPBD surface are produced, the 
process is progressive. If the level of variegation is 
high that the ppbd gene ir.;-iy not be detected* the assurance 
of progressivity diminishes. If the probability of 
recovering PPBD is neglijl^e. then the probability of 
progressive behavior is also negligible. 

A level of variegation that allows recovery of the 
PPBD has two properties: 

1) we can not regress because the PPBD is available, 

2) an enormous number of multiple changes related to the 
PPBD are available for selection and we are able to 
detect and benefit from these changes - 

It is very unlikely that all of the variants will be worse 
than the PPBD; we desire the presence of PPBD at detec- 
table levels to insure that all the sequences present are 
indeed related to PPBD. 

An opposing force in our design considerations is 
that PBDs are useful in the population only up to the 
amount that can be detected; any excess above the detec- 
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table amount is wasted. Thus we produce as many surfaces 
related to PPBD as possible within the constraint that the 
PPBD be detectable. 

If the level of variegation in the previous variega- 
5 tion cycle was correctly chosen, then the amino acids 
selected to be in the residues just varied are the ones 
best determined. The environment of other residues has 
changed, so that it is appropriate to vary them again. 
Because there are often more residues in the principal and 

10 secondary sets than can be varied simultaneously, we start 
by picking residues that either have never- been varied 
(highest priority) or that have not been varied for one -or 
more cycles- If we find that varying all the residues 
except those varied in the previous cycle does not allow a 

15 high enough level of diversity, then residues varied in 
uie previous cycle might be varied again. For example, if 
M ntv ( the nuirJb ^ r of independent trannf crmants that can be 
produced from Y D ioo of DNA ) and c sensi v' the sensitivity of 
i>e affinity separation) were such that seven residues 

20 could be varied, and if the principal and secondary sets 
contained 13 residues, we would always vary seven resi- 
dues, even though that implies varying some residue twice 
in a row. In such cases, we would pick the residues just 
varied that contain the amino acids of highest abundance 

25 in the variegated codons used. 

It is the accumulation of information that allows the 
process to select those protein sequences that produce 
binding between the SBD and the target. Some interfaces 
between proteins and other molecules involve twenty or 

30 more residues. Complete variation of twenty residues 
would generate 10 26 different proteins. By dividing the 
residues that lie close together in space into overlapping 
groups of five to seven residues, we can vary a large 
surface but never need to test more than 10 7 to 10 9 
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candidates at once, a savings of 10 19 to 10 17 fold. The 
power of selection with accumulation of information is 
well illustrated in Chapter 3 of DAWK8 6 . 

Use of NNT or NNG variegated codons leads to very 
5 efficient sampling of variegated libraries because the 
ratio of (different amino-acid sequences)/ (different DNA 
sequences) is much closer to unity than it is for NNK or 
even the optimized vg codon (fxS) . Nevertheless, a few 
amino acids are omitted in each case. Both NNT and NNG 

10 allow members of all important classes of amino acids: 
hydrophobic, hydrophilic, acidic, basic, neutral hydrophi- 
lic, small, and large* After selecting a binding domain, 
a subsequent variegation and selection may be desirable to 
achieve a higher affinity or specificity. During this 

15 second variegation, amino acid possibilities overlooked by 
the preceding variegation may be investigated. 

In t!? r -i first rourd, we assume that the parental 
protein has no known affinity for the target material. 
For example, consider the parental mini-prot^ir^ similar 
2 0 to that discussed in Example 11, having the structure X^- 

C2~ X 3~ X 4~ X 5~ X 6~ C 7~ X 8 in which c 2 and c 7 form a disulfide 
bond. Introduction of extra cysteines may cause alterna- 
tive structures to form which might be disadvantageous. 
Accidental cysteines at positions 4 or 5 are thought to be 

25 potentially more troublesome than at the other positions. 
We adopt the pattern of variegation: X^NNT, X 3 :NNT, 
X 4 :NNG, X 5 :NNG, X 6 :NNT, and X 8 :NNT, so that cysteine can 
not occur at positions 4 and 5. (Table 131 shows the 
number of different amino acids expected in libraries 

30 prepared with DNA variegated in this way and comprising 
different numbers of independent transformants. ) 

In the second round of variegation, a preferred 
strategy is to vary each position through a new set of 
residues which includes the amino acid(s) which were found 
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at that position in the successful binding domains, and 
which include as many as possible of the residues which 
were excluded in the first round of variegation. 

A few examples may be helpful. Suppose we obtained 
5 PRO using NNT. This amino acid is available with either 
NNT or NNG. We can be reasonably sure that PRO is the 
best amino acid from the set [PRO, LEU, VAL, THR, ALA, 
ARG, GLY, PHE, TYR, CYS, HIS, ILE, ASN, ASP, SER] . Thus 
we need to try a set that includes [PRO, TRP, GLN, MET, 
10 LYS, GLU] . The set allowed by NNG is the preferred set. 

What if we obtained HIS instead? Histidine is 
aromatic and fairly hydrophobic and can form hydrogen 
bonds to and from the imidazole ring. Tryptophan is 
hydrophobic and aromatic and can donate a hydrogen to a 
15 suitable acceptor and was excluded by the NNT codon. 
Methionine was also excluded and is hydrophobic. Thus, 
r#ns preferred course is to yso t:he variegated codon HD° 
that allows [HIS, GLN, ASN, LYS, TYR, CYS, TRP, ARG, SER, 
GLY, <stop>], 

20 GLN can be encoded by the NNG codon. If GLN is 

selected, at the next round we might use the vg codon VAS 
, that encodes three of the seven excluded possibilities, 
viz . HIS, ASN, and ASP. The codon VAS encodes 6 amino 
acid sequences in six DNA sequences. This leaves PHE, 

25 CYS, TYR, and ILE untested, but these are all very hydro- 
phobic. Switching to NNT would be undesirable because 
that would exclude GLN. One could use NAS that includes 
TYR and <stop>. Suppose the successful amino acid encoded 
by an NNG codon was ARG. Here we switch to NNT because 

30 this allows ARG plus all the excluded possibilities. 

THR is another possibility with the NNT codon. If 
THR is selected, we switch to NNG because that includes 
the previously excluded possibilities and includes THR. 
Suppose the successful amino acid encoded by the NNT codon 
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was ASP* We use RRS at the next variegation because this 
includes both acidic amino acids plus LYS and ARG. One 
could also use VRS to allow GLN. 

Thus, later rounds of variegation test both amino 
5 acid positions not previously mutated, and amino acid 
substitutions at a previously mutated position which were 
not within the previous substitution set. 

If the first round of variegation is entirely 
unsuccessful, a different pattern of variegation should be 
10 used* For example, if more than one interaction set can 
be defined within a domain, the residues varied in the 
next round of variegation should be from a different set 
than that probed in the initial variegation. If repeated 
failures are encountered, one may switch to a different 
15 IPBD. 

IV, DISPLAY iTRATEiix; DISPLAYING FOREIGN BINDING DOMAINS 

ON THE SURFACE OF A "GENETIC PACKAGE" 

IV. A. General Requirements for Genetic Pa ckages 

It is emphasized that the GP on which selection- 
20 through-binding will be practiced must be capable, after 
the selection, either of growth in some suitable environ- 
ment or of in vitro amplification and recovery of the 
encapsulated genetic message . During at least part of the 
growth, the increase in number is preferably approximately 
25 exponential with respect to time. The component of a 
population that exhibits the desired binding properties 
may be quite small, for example, one in 10 6 or less. Once 
this component of the population is separated from the 
non-binding components, it must be possible to amplify it. 
30 Culturing viable cells is the most powerful amplification 
of genetic material known and is preferred. Genetic 
messages can also be amplified in vitro, e.g. by PGR, but 
this is not the most preferred method. 
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Preferred CPs are vegetative bacterial cells, 
bacterial spores and bacterial DNA viruses. Eukaryotic 
cells could be used as genetic packages but have longer 
dividing times and more stringent nutritional requirements 
5 than do bacteria and it is much more difficult to produce 
a large number of independent transf ormants . They are also 
more fragile than bacterial cells and therefore more 
difficult to chromatograph without damage. Eukaryotic 
viruses could be used instead of bacteriophage but must be 
10 propagated in eukaryotic cells and therefore suffer from 
some of the amplification problems mentioned above. 

Nonetheless, a strain of any living cell or virus is 
potentially useful if the strain can be: 1) genetically 
altered with reasonable facility to encode a potential 

15 binding domain, 2) maintained and amplified in culture, 
3) manipulated to display the potential binding protein 
domain where it can interact vith the target materia: 
during affinity separation, and 4) affinity separated 
while retaining the genetic i*?3Tormation encoding the 

20 displayed binding domain in recoverable form. Preferably, 
the GP remains viable after affinity separation. 

When the genetic package is a bacterial cell, or a 
phage which is assembled periplasmically, the display 
means has two components. The first component is a 

25 secretion signal which directs the initial expression 
product to the inner membrane of the cell (a host cell 
when the package is a phage) . This secretion signal is 
cleaved off by a signal peptidase to yield a processed, 
mature, potential binding protein. The second component 

30 is an outer surface transport signal which directs the 
package to assemble the processed protein into its outer 
surface. Preferably, this outer surface transport signal 
is derived from a surface protein native to the genetic 
package . 
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For example, in a preferred embodiment, the hybrid 
gene comprises a DNA encoding a potential binding domain 
operably linked to a signal sequence ( e.g. . the signal 
sequences of the bacterial phoA or bla genes or the 
signal sequence of M13 phage genelll ) and to DNA encoding 
a coat protein f e.g. , the M13 gene III or gene VIII 
proteins) of a filamentous phage f e.g. , M13) . The 
expression product is transported to the inner membrane 
(lipid bilayer) of the host cell, whereupon the signal 
peptide is cleaved off to leave a processed hybrid 
protein. The C-terminus of the coat protein-like com- 
ponent of this hybrid protein is trapped in the lipid 
bilayer, so that the hybrid protein does not escape into 
the periplasmic space. (This is typical of the wild- type 
coat protein.) As the single-stranded DNA of the nascent 
phage particle passes into the periplasmic space, it 
colleges both wild-type co\l protein and the hybrid 
protein from the lipid bilayer. The hybrio procein 
thus packaged into the surface sheath of the filamentous 
phage, leaving the potential binding domain exposed on its 
outer surface. (Thus, the filamentous phage, not the host 
bacterial cell, is the "replicable genetic package" in 
this embodiment.) 

If a secretion signal is necessary for the display of 
the potential binding domain, in an especially preferred 
embodiment the bacterial cell in which the hybrid gene is 
expressed is of a "secretion-permissive" strain. 

When the genetic package is a bacterial spore, or a 
phage whose coat is assembled intracellularly, a secretion 
signal directing the expression product to the inner 
membrane of the host bacterial cell is unnecessary. In 
these cases, the display means is merely the outer surface 
transport signal, typically a derivative of a spore or 
phage coat protein* 
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There are several methods of arranging that the ipbd 
gene is expressed in such a manner that the IPBD is 
displayed on the outer surface of the GP. If one or more 
fusions of fragments of x genes to fragments of a natural 
5 osp gene are known to cause X protein domains to appear on 
the GP surface, then we pick the DNA sequence in which an 
ipbd gene fragment replaces the x gene fragment in one of 
the successful osp-x fusions as a preferred gene to be 
tested for the display-of-IPBD phenotype. (The gene may 
10 be constructed in any manner.) If no fusion data are 
available, then we fuse an ipbd fragment to various 
fragments, such as fragments that end at known or pre- 
dicted domain boundaries, of the osp gene and obtain GPs 
-jf that display the osp-ipbd fusion on the GP outer surface 

|S 15 by screening or selection for the display-of-IPBD pheno- 

yQ type. The OSP may be modified so as to increase the 

f ic^xinif "i ty and/or length of the linkage between the* OSP 
r= and the IPBD auu triarchy reduce interference b^t^een the 

CO two * 

^ 20 The fusion of ipbd and osp fragments may also include 

jrt fragments of random or pseudorandom DNA to produce a 

population, members of which may display IPBD on the GP 
5J surface. The members displaying IPBD are isolated by 

y, screening or selection for the display-of-binding pheno- 

25 type. 

The replicable genetic entity (phage or plasmid) that 
carries the osp-pbd genes (derived from the osp-ipbd gene) 
through the selection-through-binding process, is referred 
to hereinafter as the operative cloning vector (OCV) . 
30 When the OCV is a phage, it may also serve as the genetic 
package. The choice of a GP is dependent in part on the 
availability of a suitable OCV and suitable OSP. 

Preferably, the GP is readily stored , for example, by 
freezing. If the GP is a cell, it should have a short 
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doubling time, such as 20-40 minutes. If the GP is a 
virus, it should be prolific, e.g. , a burst size of at 
least 100/infected cell. GPs which are finicky or 
expensive to culture are disfavored. The GP should be 
easy to harvest, preferably by centrifugation. The GP is 
preferably stable for a temperature range of -70 to 42 °C 
(stable at 4°C for several days or weeks); resistant to 
shear forces found in HPLC; insensitive to UV; tolerant of 
desiccation; and resistant to a pH of 2.0 to 10.0, surface 
active agents such as SDS or Triton, chaotropes such as 4M 
urea or 2M guanidinium HC1, common ions such as K + , Na + , 
and SO4 , common organic solvents such as ether and 
acetone, and degradative enzymes. Finally, there must be 
a suitable 0CV. 

Although knowledge of specific OSPs may not be 
required for vegetative bacterial cells and endospores, 
the user of the present invention, preferably, will know: 
Is the sequence o± any osp known? (preferably yes, at 
least one required l^or phage) . How does the OSP arrive at 
the surface of GP? (knowledge of route necessary, dif- 
ferent routes have different uses, no route preferred per 
se) . Is the OSP post-translationally processed? (no 
processing most preferred, predictable processing prefer- 
red over unpredictable processing) . What rules are known 
governing this processing, if there is any processing? (no 
processing most preferred, predictable processing accep- 
table) . What function does the OSP serve in the outer 
surface? (preferably not essential) . Is the 3D structure 
of an OSP known? (highly preferred) . Are fusions between 
fragments of osp and a fragment of x known? Does expres- 
sion of these fusions lead to X appearing on the surface 
of the GP? (fusion data is as preferred as knowledge of a 
3D structure) . Is a "2D" structure of an OSP available? 
(in this context, a "2D" structure indicates which 
residues are exposed on the cell surface) (2D structure 
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less preferred than 3D structure) . Where are the domain 
boundaries in the OSP? (not as preferred as a 2D struc- 
ture, but acceptable) . Could IPBD go through the same 
process as OSP and fold correctly? (IPBD might need 
5 prosthetic groups) (preferably IPBD will fold after same 
process) . Is the sequence of an osp promoter known? 
(preferably yes) « Is osp gene controlled by regulatable 
promoter available? (preferably yes) . What activates this 
promoter? (preferably a diffusible chemical, such as 

10 IPTG) • How many different OSPs do we know? (the more the 
better) . How many copies of each OSP are present on each 
package? (more is better) * 

The user will want knowledge of the physical attri- 
butes of the GP: How large is the GP? (knowledge useful 

15 in deciding how to isolate GPs) (preferably easy to 
separate from soluble proteins such as IgGs) • What is the 
charge on the GP? (neutral preferred) . What is the 
sedimentation rate of the GP? (knowledge preferred, no 
particular value preferred) . 

20 The preferred GP, OCV and OSP are those for which the 

fewest serious obstacles can be seen, rather than the one 
that scores highest on any one criterion* 

Viruses are preferred over bacterial cells and spores 
(cp. LUIT85 and references cited therein) . The virus is 

25 preferably a DNA virus with a genome size of 2 kb to 10 kb 
base pairs, such as (but not limited to) the filamentous 
(Ff) phage M13, fd, and fl ( inter alia see RASC86, BOEK80, 
B0EK82, DAYL88, GRAY 8 lb, KUHN88, LOPE85, WEBS85, MARV75, 
MARV80, MOSE82, CRIS84, SMIT88a, SMIT88b) ; the IncN 

30 specific phage Ike and If 1 (NAKA81 , PEET85 , PEET87 , 
THOM83, THOM88a) ; IncP-specif ic Pseudomonas aeruginosa 
phage Pfl (THOM83, THOM88a) and Pf3 (LUIT83, LUIT85, 
LUTI87, TH0M88a) ; and the Xanthomonas oryzae phage Xf 
(THOM83, TH0M88a) - Filamentous phage are especially 
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preferred . 

Preferred OSPs for several GPs are given in Table 2. 
References to osp-ipbd fusions in this section should be 
taken to apply, mutatis mutandis , to osp-pbd and osp-sbd 
5 fusions as well. 

The species chosen as a GP should have a well-charac- 
terized genetic system and strains defective in genetic 
recombination should be available. The chosen strain may 
need to be manipulated to prevent changes of its physio- 
10 logical state that would alter the number or type of 
proteins or other molecules on the cell -surface during the 
affinity separation procedure. 

IV, B. Phages for Use as GPs: 

Unlike bacterial cells and spores, choice of a phage 

15 depends strongly on knowledge of the- 3D structure of an 
OSP and hew it ir.c-racLs with other protein* .\7* the 
capsid. This does not mean that we ^ed atomic resolution 
of the OSP, but that we need to know which segments of the 
OSP interact to make the viral coat and which segments are 

20 not constrained by structural or functional roles. The 
size of the phage genome and the packaging mechanism are 
also important because the phage genome itself is the 
cloning vector. The ost>-ipbd gene is inserted into the 
phage genome; therefore: 1) the genome of the phage must 

25 allow introduction of the osp-ipbd gene either by tolerat- 
ing additional genetic material or by having replaceable 
genetic material; 2) the virion must be capable of 
packaging the genome after accepting the insertion or 
substitution of genetic material, and 3) the display of 

30 the OSP-IPBD protein on the phage surface must not disrupt 
virion structure sufficiently to interfere with phage 
propagation. 

The morphogenetic pathway of the phage determines the 
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environment in which the IPBD will have opportunity to 
fold. Periplasmically assembled phage are preferred when 
IPBDs contain essential disulfides, as such IPBDs may not 
fold within a cell (these proteins may fold after the 
5 phage is released from the cell) . Intracellularly 
assembled phage are preferred when the IPBD needs large or 
insoluble prosthetic groups (such as Fe 4 S 4 clusters) , 
since the IPBD may not fold if secreted because the 
prosthetic group is lacking. 

10 When variegation is introduced in Part II, multiple 

infections could generate hybrid GPs that carry the gene 
for one PBD but have at least some copies of k different 
PBD on their surfaces; it is preferable to minimize this 
possibility by infecting cells with phage under conditions 

15 resulting in a low multiple-of -infection (MOI).- 

Bacteriophages are excellent candidates for GPs 
because there is little or no enzymatic activity associ- 
ated with intact mature phage , and because the genes are 
inactive outside a bacterial host, rendering the mature 
20 phage particles metabolically inert. 

The filamentous phages ( e.g. , M13) are of particular 
interest* 

For a given bacteriophage, the preferred OSP is 
usually one that is present on the phage surface in the 

25 largest number of copies, as this allows the greatest 
flexibility in varying the ratio of OSP-IPBD to wild type 
OSP and also gives the highest likelihood of obtaining 
satisfactory affinity separation. Moreover, a protein 
present in only one or a few copies usually performs an 

3 0 essential function in morphogenesis or infection; mutating 
such a protein by addition or insertion is likely to 
result in reduction in viability of the GP. Nevertheless, 
an OSP such as M13 gill protein may be an excellent choice 
as OSP to cause display of the PBD. 
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It is preferred that the wild-type osp gene be 
preserved. The ipbd gene fragment may be inserted either 
into a second copy of the recipient osp gene or into a 
novel engineered osp gene. It is preferred that the osp- 
5 ipbd gene be placed under control of a regulated promoter. 
Our process forces the evolution of the PBDs derived from 
IPBD so that some of them develop a novel function, viz. , 
binding to a chosen target. Placing the gene that is 
subject to evolution on a duplicate gene is an imitation 

10 of the widely-accepted scenario for the evolution of 
protein families. It is now generally accepted that gene 
duplication is the first step in the evolution of a 
protein family from an ancestral protein. By having two 
copies of a gene, the affected physiological process can 

15 tolerate mutations in one of the genes • This process is 
well understood and documented for the globin family (cf . 
DICK83, p65ff, and CREI84, pll7-125) . 

The user must choose a site in the candidate OSP gene 
for inserting a ipbd gene fragment. The coats of most 

20 bacteriophage are highly ordered. Filamentous phage can 
be described by a helical lattice; isometric phage, by an 
icosahedral lattice. Each monomer of each major, coat 
protein sits on a lattice point and makes defined interac- 
tions with each of its neighbors. Proteins that fit into 

25 the lattice by making some, but not all, of the normal 
lattice contacts are likely to destabilize the virion by: 
a) aborting formation of the virion, b) making the virion 
unstable, or c) leaving gaps in the virion so that the 
nucleic acid is not protected. Thus in bacteriophage, 

30 unlike the cases of bacteria and spores, it is important 
to retain in engineered OSP-IPBD fusion proteins those 
residues of the parental OSP that interact with other 
proteins in the virion. For M13 gVIII, we retain the 
entire mature protein, while for M13 gill, it might 

35 suffice to retain the last 100 residues (or even fewer) . 
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Such a truncated gill protein would be expressed in 
parallel with the complete gill protein, as gill protein 
is required for phage infectivity. 

Il'ichev et al. (ILIC89) have reported viable phage 
having alterations in gene VIII . In one case, a point 
mutation changed one amino acid near the amino terminus of 
the mature gVIII protein from GLU to ASP. In the other 
case, five amino acids were inserted at the site, of the 
first mutation. They suggested that similar constructions 
could be used for vaccines. They did not report on any 
binding properties of the modified phage, nor did they 
suggest mutagenizing the inserted material. Furthermore, 
they did not insert a binding domain, nor did they suggest 
inserting such a domain. 

Further considerations on the design of the ipbd ? : osp 
gene is discussed in section IV. F. 

Filamentous phage: 

Compared to other bacteriophage, filamentous phage in 
general are attractive and Ml 3 in particular is especially 
attractive because: 1) the 3D structure of the virion is 
known; 2) the processing of the coat protein is well 
understood; 3) the genome is expandable; 4) the genome is 
small; 5) the sequence of the genome is known; 6) the 
virion is physically resistant to shear, heat, cold, urea, 
guanidinium CI, low pH, and high salt; 7) the phage is a 
sequencing vector so that sequencing is especially easy; 
8) antibiotic-resistance genes have been cloned into the 
genome with predictable results (HINE80) ; 9) It is easily 
cultured and stored (FRIT85) , with no unusual or expensive 
media requirements for the infected cells, 10) it has a 
high burst size, each infected cell yielding 100 to 1000 
M13 progeny after infection; and 11) it is easily har- 
vested and concentrated (SALI64, FRIT85) . 
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The filamentous phage include M13, fl, fd, Ifl, Ike, 
Xf, Pfl, and Pf3. 

The entire life cycle of the filamentous phage M13, a 
common cloning and sequencing vector, is well understood* 
5 M13 and fl are so closely related that we consider the 
properties of each relevant to both (RASC86) ; any differ- 
entiation is for historical accuracy. The genetic 
structure (the complete sequence (SCHA78), the identity 
and function of the ten genes, and the order of transcrip- 

10 tion and location of the promoters) of M13 is well known 
as is the physical structure of the virion (BANN81, 
BOEK80, CHAN79, IT0K79, KAPL78, KUHN85b, KUHN87, MAKO80, 
MARV78, MESS 7 8 , 0HKA81, RASC86, RUSS81, SCHA78, SMIT85, 
WEBS 7 8 , and ZIMM82) ; see RASC86 for a recent review of the 

15 structure and function of the coat proteins. Because the 
genome is small (6423 bp), cassette mutagenesis is 
practical on RF M13 (AUSU87) , as is single-strandad oligo- 
nt directed mutagenesis (FRIT85) . M13 is a plasmid and 
transformation system in itself, and an ideal sequencing 

20 vector. M13 can be grown on Rec" strains of E^ coli. The 
M13 genome is expandable (MESS78 , FRIT85) and M13 does not 
lyse cells. Because the M13 genome is extruded through 
the membrane and coated by a large number of identical 
protein molecules, it can be used as a cloning vector 

25 (WATS 8 7 p278, and MESS77) . Thus we can insert extra genes 
into M13 and they will be carried along in a stable 
manner. 

Marvin and collaborators (MARV78, MAK080, BANN81) 
have determined an approximate 3D virion structure of fl 
3 0 by a combination of genetics, biochemistry, and X-ray 
diffraction from fibers of the virus. Figure 4 is drawn 
after the model of Banner et al^ (BANN81) and shows only 
the C a s of the protein. The apparent holes in the 
cylindrical sheath are actually filled by protein side 
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groups so that the DNA within is protected . The amino 
terminus of each protein monomer is to the outside of the 
cylinder, while the carboxy terminus is at smaller radius, 
near the DNA- Although other filamentous phages ( e.g. Pfl 
or Ike) have different helical symmetry, all have coats 
composed of many short a-helical monomers with the amino 
terminus of each monomer on the virion surface. 

The major coat protein is encoded by gene VIII. The 
50 amino acid mature gene VIII coat protein is synthesized 
as a 73 amino acid precoat (IT0K79) . The first 23 amino 
acids constitute a typical signal-sequence which causes 
the nascent polypeptide to be inserted into the inner cell 
membrane. Whether the precoat inserts into the membrane 
by itself or through the action of host secretion com- 
ponents, such as SecA and SecY, remains controversial, but 
has no effect on the operation of the present invention. 

An coli signal peptidase (SP-I) recognizes amino 
acids 18, 21, and 23, and, to a lesser extent, residue 22, 
and cuts between residues 23 and 24 of the precoat 
(KUHN85a, KUHN85b, OLIV87) . After removal of the signal 
sequence, the amino terminus of the mature coat is located 
on the periplasmic side of the inner membrane; the carboxy 
terminus is on the cytoplasmic side. About 3000 copies of 
the mature 50 amino acid coat protein associate side-by- 
side in the inner membrane. 

The sequence of gene VIII is known, and the amino 
acid sequence can be encoded on a synthetic gene, using 
lacUVS promoter and used in conjunction with the LacI^ 
repressor. The lacUVS promoter is induced by IPTG, 
Mature gene VIII protein makes up the sheath around the 
circular ssDNA. The 3D structure of fl virion is known at 
medium resolution; the "amino terminus of gene VIII protein 
is on surface of the virion. A few modifications of gene 
VIII have been made and are discussed below • The 2D 
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structure of M13 coat protein is implicit in the 3D 
structure. Mature M13 gene VIII protein has only one 
domain. 

When the GP is M13 the gene III and the gene VIII 
5 proteins are highly preferred as OSP (see Examples I 
through IV). The proteins from genes VI, VII, and IX may 
also be used. 

As discussed in the Examples, we have constructed a 
tripartite gene comprising: 
10 1) DNA encoding a signal sequence directing secretion 
of parts (2) and (3) through the inner membrane, 
p% 2) DNA encoding the mature BPTI sequence, and 

yn 3) DNA encoding the mature Ml 3 gVIII protein* 

03 This gene causes BPTI to appear in active form on the 

ffl 15 surface of M13 phage. 

ttl - The gene VIII protein is a preferred OSP because it 

^ is present in many copies and because its location and 

,7 orientation in the virion are known ( BANNS 1) . Preferably, 

0 the PBD is attached to the amino terminus of the mature 

!f: 20 M13 coat protein. Had direct fusion of PBD to M13 CP 

^ failed to cause PBD to be displayed on the surface of M13, 

□ we would have varied part of the mini-protein sequence 

and/or insert short random or nonrandom spacer sequences 

between mini-protein and M13 CP. The 3D model of fl 
25 indicates strongly that fusing IPBD to the amino terminus 

of M13 CP is more likely to yield a functional chimeric 

protein than any other fusion site. 

Similar constructions could be made with other 
filamentous phage. Pf3 is a well known filamentous phage 
30 that infects Pseud omonas aeruaenosa cells that harbor an 
IncP-l plasmid. The entire genome has been sequenced 
(LUIT85) and the genetic signals involved in replication 
and assembly are known (LUIT87) . The major coat protein 
of PF3 is unusual in having no signal peptide to direct 
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its secretion . The sequence has charged residues ASP 7/ 
ARG 37 , LYS 40 , and PHE44-COO" which is consistent with the 
amino terminus being exposed . Thus, to cause an IPBD to 
appear on the surface of Pf3, we construct a tripartite 
5 gene comprising: 

1) a signal sequence known to cause secretion in P^ 
aeruaenosa (preferably known to cause secretion of 
IPBD) fused in-frame to, 

2) a gene fragment encoding the IPBD sequence, fused 
10 in-frame to, 

3) DNA encoding the mature Pf3 coat protein. 
Optionally, DNA encoding a flexible linker of one to 10 
amino acids is introduced between the ipbd gene fragment 
and the Pf3 coat-protein gene. Optionally, DNA encoding 

15 the recognition site for a specific protease, such as 
tissue plasminogen activator or blcod clotting Factor Xa, 
is introduced between the ipbd gene fragment and the Pf3 
coat-protein gene. Amino acids that form the recognition 
site for a specific protease may also serve the function 

20 of a flexible linker. This tripartite gene is introduced 
into Pf3 so that it does not interfere with expression of 
any Pf3 genes. To reduce the possibility of genetic 
recombination, part (3) is designed to have numerous 
silent mutations relative to the wild-type gene. Once the 

25 signal sequence is cleaved off, the IPBD is in the 
periplasm and the mature coat protein acts as an anchor 
and phage-assembly signal. It matters not that this 
fusion protein comes to rest in the lipid bilayer by a 
route different from the route followed by the wild-type 

30 coat protein. 

The amino-acid sequence of M13 pre-coat (SCHA78) , 
called AA_seql, is 
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AA_seql 

1 1 2||2 3 3 4 4 5 
5050 V5 05050 
MKKSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSLQASATEYIGYAWA 

5 

5 6 6 7 7 
5 0 5 0 3 
MVWIVGATIGIKLFKKFTSKAS 

10 

The single-letter codes for amino acids and the codes for 
ambiguous DNA are given in Table 1. The best site for 
inserting a novel protein domain into M13 CP is after A23 
because SP-I cleaves the precoat protein after A23, as 

15 indicated by the arrow. Proteins that can be secreted 
will appear connected to mature Ml 3 CP at its amino 
terminus. Because the amino terminus of mature M13 CP is 
located on the outer surface of the virion, the introduced 
domain will be displayed on the outside of the virion. 

2 0 The uncertainty of the mechanism by which M13CP appears in 
the lipid bilayer raises the possibility that direct 
insertion of bpti into gene VIII may not yield a func- 
tional fusion protein. It may be necessary to change the 
signal sequence of the fusion to, for example, the phoA 

25 signal sequence (MKQSTIALALLPLLFTPVTKA ). Marks et 

al. (MARK86) showed that the PhoA signal peptide could 
direct mature BPTI to the L. coli periplasm. 

Another vehicle for displaying the IPBD is by 
expressing it as a domain of a chimeric gene containing 

30 part or all of gene III . This gene encodes one of the 
minor coat proteins of M13. Genes VI, VII, and IX also 
encode minor coat proteins. Each of these minor proteins 
is present in about 5 copies per virion and is related to 
morphogenesis or infection* In contrast, the major coat 

35 protein is present in more than 2500 copies per virion. 
The gene VI, VII, and IX proteins are present at the ends 
of the virion; these three proteins are not post-transla- 
tionally processed (RASC86) . 
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The single-stranded circular phage DNA associates 
with about five copies of the gene III protein and is then 
extruded through the patch of membrane-associated coat 
protein in such a way that the DNA is encased in a helical 
5 sheath of protein (WEBS78) . The DNA does not base pair 
(that would impose severe restrictions on the virus 
genome) ; rather the bases intercalate with each other 
independent of sequence. 

Smith (SMIT85) and de la Cruz et al. (DELA88) have 
10 shown that insertions into gene III cause novel protein 
domains to appear on the virion outer surface. The mini- 
protein^ gene 'may be fused to gene III at the site used 
by Smith and by de la Cruz et ai. , at a codon correspond- 
ing to another domain boundary or to a surface loop of the 
15 protein, or to the amino terminus of the mature protein. 

All published works use a vector containing a single 
modified gene III of fd. Thus, all five copies of gill 
are identically modified. Gene III is quite large (1272 
b.p. or about 20% of the phage genome) and it is uncertain 

20 whether a duplicate of the whole gene can be stably 
inserted into the phage. Furthermore, all five copies of 
-gill protein are at one end of the virion. When bivalent 
target molecules (such as antibodies) bind a pentavalent 
phage, the resulting complex may be irreversible. 

25 Irreversible binding of the GP to the target greatly 
interferes with affinity enrichment of the GPs that carry 
the genetic sequences encoding the novel polypeptide 
having the highest affinity for the target. 

To reduce the likelihood of formation of irreversible 
3 0 complexes, we may use a second, synthetic gene that 
encodes carboxy-terminal parts of III . We might, for 
example, engineer a gene that consists of (from 5 1 to 3'): 

1) a promoter (preferably regulated) , 

2) a ribosome-binding site, 
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3) an initiation codon, 

4) a functional signal peptide directing secretion of 
parts (5) and (6) through the inner membrane, 

5) DNA encoding an IPBD, 

5 6) DNA encoding residues 275 through 424 of Ml 3 gill 
protein, 

7) a translation stop codon, and 

8) (optionally) a transcription stop signal. 

We leave the wild-type gene III so that some unaltered 
10 gene III protein will be present. Alternatively, we may 
use gene VIII protein as the OSP and regulate the osp:- 
: jpbd fusion so that only one or a few copies of the 
fusion protein appear on the phage, 

M13 gene VI, VII, and IX proteins are not processed 
15 after translation. The route by which these proteins are 
assembled into the pha^e have* not been reported. These 
proteins are necessary for normal morphogenesis and 
infectivity of the phage. Whether these molecules (gene 
VI protein, gene VII protein, and gene IX protein) attach 
20 themselves to the phage: a) from the cytoplasm, b) from 
the periplasm, or c) from within the lipid bilayer, is not 
known. One could use any of these proteins to introduce 
an IPBD onto the phage surface by one of the construc- 
tions: 

25 1) ipbd : : pmcp , 

2) pmcp : : ipbd , 

3) signal : ; ipbd : : pmcp , and 

4) signal : : pmcp : : ipbd . 

where ipbd represents DNA coding on expression for the 
30 initial potential binding domain; pmcp represents DNA 
coding for one of the phage minor coat proteins, VI, VII, 
and IX; signal represents a functional secretion signal 
peptide, such as the phoA signal (MKQSTIALALLPLLFTPVTKA) ; 
-and " ::" represents in-frame genetic fusion. The indi- 
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cated fusions are placed downstream of a known promoter, 
preferably a regulated promoter such as lacUVS , tac , or 
trp. Fusions (1) and (2) are appropriate when the minor 
coat protein attaches to the phage from the cytoplasm or 
5 by autonomous insertion into the lipid bilayer. Fusion 
(1) is appropriate if the amino terminus of the minor coat 
protein is free and (2) is appropriate if the carboxy 
terminus is free. Fusions (3) and (4) are appropriate if 
the minor coat protein attaches to the phage from the 
10 periplasm or from within the lipid bilayer. Fusion (3) is 
appropriate if the amino terminus of the minor coat 
protein is free and (4) is appropriate if the carboxy 
*z terminus is free. 

03 Bacteriophage $X174: 

jjj 15 The bacteriophage §X174 is a very small icosahedral 

|2 virus which has been thoroughly studied by genetics, 

biochemistry, and electron microscopy (See The Single- 
^ Stranded DNA Phages (DENH78) ) . To date, no proteins from 

D #X174 have been studied by X-ray diffraction. $X174 is 

y< 20 not used as a cloning vector because $X174 can accept 

JJj very little additional DNA; the virus is so tightly con- 

p - - strained that several of its genes overlap. Chambers et 

H al. (CHAM82) showed that mutants in gene G are rescued by 

the wild-type G gene carried on a plasmid so that the host 
25 supplies this protein. 

Three gene products of §X174 are present on the 
outside of the mature virion: F (capsid) , G (major spike 
protein, 60 copies per virion) , and H (minor spike 
protein, 12 copies per virion) . The G protein comprises 
30 175 amino acids, while H comprises 328 amino acids. The F 
protein interacts with the single-stranded DNA of the 
virus. The proteins F, G, and H are translated from a 
single mRNA in the viral infected cells. If the G 
protein is supplied from a plasmid in the host, then the 
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viral g gene is no longer essential. We introduce one or 
more stop codons into g so that no G is produced from the 
viral gene. We fuse a pbd gene fragment to h, either at 
the 3 1 or 5 1 terminus. We eliminate an amount of the 
5 viral g gene equal to the size of pbd so that the size of 
the genome is unchanged. 

Large DNA Phages 

Phage such as \ or T4 have much larger genomes than 
do Ml 3 or $X174. Large genomes are less conveniently 

10 manipulated than small genomes. Phage \ has such a large 
genome that cassette mutagenesis is not practicable. One 
can not use annealing of a mutagenic oligonucleotide 
either, because there is no ready supply of single- 
stranded \ DNA. (\ DNA is packaged as double-stranded 

15 DNA.) Phage such as X anc * T4 have more complicated 3D 
capsid structures than M13 or $X174, wi uh more OSPs to 
choose from. Intracellular morphogenesis of phage X 
could cause protein domains that contain disulfide bonds 
in their folded forms not to fold. 

20 Phage X virions and phage T4 virions form intracel- 
lular^, so that IPBDs requiring large or insoluble 
" prosthetic groups might fold on the surfaces of these 
phage . 

RNA Phages 

25 RNA phage are not preferred because manipulation of 

RNA is much less convenient than is the manipulation of 
DNA. If the RNA phage MS 2 were modified to make room for 
an osp-ir>bd gene and if a message containing the A protein 
binding site and the gene for a chimera of coat protein 

3 0 and a PBD were produced in a cell that also contained A 
protein and wild-type coat protein (both produced from 
regulated genes on a plasmid) , then the RNA coding for the 
chimeric protein would get packaged. A package comprising 
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RNA encapsulated by proteins encoded by that RNA satisfies 
the major criterion that the genetic message inside the 
package specifies something on the outside. The particles 
by themselves are not viable unless the modified A protein 
5 is functional. After isolating the packages that carry an 
SBD, we would need to: 1) separate the RNA from the 
protein capsid; 2) reverse transcribe the RNA into DNA, 
using AMV or MMTV reverse transcriptase, and 3) use 
Thermus aquaticus DNA polymerase for 25 or more cycles of 
10 Polymerase Chain Reaction (™) to amplify the osp-sbd DNA 
until there is enough to subclone the recovered genetic 
message into a plasmid for sequencing and further work. 

Alternatively, helper phage could be used to rescue 
the isolated phage. In one of these ways we can recover a 
15 sequence that codes for an SBD having desirable binding 
properties. 

TV.r;- Bacterial Cells as Genetic Packages: 

One may choose any well-characterized bacterial 
strain which (1) may be grown in culture (2) may be 
20 engineered to display PBDs on its surface, and (3) is 
compatible with affinity selection. 

Among bacterial cells, the preferred genetic packages 
are Salmonella t yphimurium . Bacillus subtilis, Pseudomonas 
aeruginosa . Vibrio cholerae, Klebsiella pneumonia, 

25 Neisseria gonorrhoeae , Neisseria meningitidis, Bacter- 
oides nodosus , Moraxella bovis . and especially Escherichia 
coli. The potential binding mini-protein may be expressed 
as an insert in a chimeric bacterial outer surface protein 
(OSP) . All bacteria exhibit proteins on their outer 

30 surfaces. Works on the localization of OSPs and the 
methods of determining their structure include: CALA90, 
HEIJ90, EHRM90, BENZ88a, BENZ88b, MAN088, BAKE87, RAND87, 
HANC87, HENR87 , NAKA86b, MAN086, SILH85, TOMM85, NIKA84, 
LUGT83, and BECK83. - - 
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In coli , LamB is a preferred OSP. As discussed 
below, there are a number of very good alternatives in 
coli and there are very good alternatives in other 
bacterial species. There are also methods for determining 
5 the topology of OSPs so that it is possible to systema- 
tically determine where to insert an ipbd into an p s p gene 
to obtain display of an IPBD on the surface of any 
bacterial species • 

In view of the extensive knowledge of coli , a 

10 strain of E±_ coli , defective in recombination, is the 
strongest candidate as a bacterial GP. 

Oliver has reviewed mechanisms of protein secretion 
in bacteria (OLIV8 5a and 0LIV87) . Nikaido and Vaara 
(NIKA87) , Benz (BENZ88b) , and Baker et al. (BAKE87 ) have 

15 reviewed mechanisms by which proteins become localized to 
the outer membrane of gram-negative bacteria • While most, 
bacterial proteins remain in the cytoplasm, others are 
transported to the periplasmic space (which lies between 
the plasma membrane and the cell wall of gram-negative 

20 bacteria) , or are conveyed and anchored to the outer 
surface of the cell- Still others are exported (secreted) 
- into the medium surrounding the -cell . Those characteris- 
tics of a protein that are recognized by a cell and that 
cause it to be transported out of the cytoplasm and 

25 displayed on the cell surface will be termed "outer- 
surface transport signals". 

Gram-negative bacteria have outer-membrane proteins 
(OMP) , that form a subset of OSPs • Many OMPs span the 
membrane one or more times. The signals that cause OMPs 
30 to localize in the outer membrane are encoded in the amino 
acid sequence of the mature protein* Outer membrane 
proteins of bacteria are initially expressed in a precur- 
sor form including a so-called signal peptide. The 
precursor protein is transported to the inner membrane , 
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and the signal peptide moiety is extruded into the 
periplasmic space. There, it is cleaved off by a "signal 
peptidase", and the remaining "mature" protein can now 
enter the periplasm. Once there, other cellular mechan- 
5 isms recognize structures in the mature protein which 
indicate that its proper place is on the outer membrane, 
and transport it to that location. 

It is well known that the DNA coding for the- leader 
or signal peptide from one protein may be attached to the 

10 DNA sequence coding for another protein, protein X, to 
form a chimeric gene whose expression causes protein X to 
appear free in the periplasm (BECK83, INOU86 ChlO, 
LEEC86, MARK86, and BOQU87) . That is, the leader causes 
the chimeric protein to be secreted through the lipid 

15 bilayer; once in the periplasm, it is cleaved off by the 
signal peptidase SP-I. 

The use of export-permissive bacterial strains 
(LISS85, STAD89) increases the probability that a signal- 
sequence-fusion will direct the desired protein to the 

20 cell surface, Liss et aL (LISS85) showed that the 
mutation prlA4 makes coli more permissive with respect 
to signal- sequences.- Similarly, Stader et al. (STAD89) 
found a strain that bears a orlG mutation and that permits 
export of a protein that is blocked from export in wild- 

25 type cells. Such export-permissive strains are preferred. 

OSP-IPBD fusion proteins need not fill a structural 
role in the outer membranes of Gram-negative bacteria 
because parts of the outer membranes are not highly 
ordered. For large OSPs there is likely to be one or more 
30 sites at which osp can be truncated and fused to ipbd such 
that cells expressing the fusion will display IPBDs on the 
cell surface. Fusions of fragments of omp genes with 
fragments of an x gene have led to X appearing on the 
outer membrane (CHAR88b, BENS84,_ CLEM81) . When such 
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fusions have been made, we can design an osp- ipbd gene by 
substituting ipbd for x in the DNA sequence. Otherwise, a 
successful OMP-IPBD fusion is preferably sought by fusing 
fragments of the best omp to an ipbd , expressing the fused 
5 gene, and testing the resultant GPs for display-of-IPBD 
phenotype. We use the available data about the OMP to 
pick the point or points of fusion between omp and ipbd to 
maximize the likelihood that IPBD will be displayed, 
(Spacer DNA encoding flexible linkers, made, e.cr. , of 

10 GLY, SER, and ASN, may be placed between the osp - and 
ipbd-derived fragments to facilitate display.) Alterna- 
tively, we truncate osp at several sites or in a manner 
that produces osp fragments of variable length and fuse 
the osp fragments to i pbd ; cells expressing the fusion are 

15 screened or selected which display IPBDs on the cell 
surface* Freudl et al. (FREU89) have shown that fragments 
of OSPs (such as OmpA) above a certain size are incorpor- 
ated into the outer membrane. An additional alternative 
is to include short segments of random DNA in the fusion 

20 of omp fragments to ipbd and then screen or select the 
resulting variegated population for members exhibiting the 
display-of-IPBD phenotype. 

In E*. coli, the LamB protein is a well understood OSP 
and can be used (BENS84 , CHAR90, RONC90, VAND90, CHAP90, 

25 MOLL90, CHAR88b, CHAR88C, CLEM81, DARG88, FERE82a, 
FERE82b, FERE83 , FERE 8 4 , FERE86a, FERE 8 6b, FERE89a, 
FERE89b, GEHR87, HALL82, NAKA86a, STAD86, HEIN88, BENS 8 7b, 
BENS87C, BOUG84, BOUL86a, CHAR84) . The coli LamB has 
been expressed in functional form in S*. tvphimurium 

30 (DEVR84, BARB85, HARK87) , cholerae (HARK86) , and JL_ 

pneumonia (DEVR84, WEHM89) , so that one could display a 
population of PBDs in any of these species as a fusion to 
E. coli LamB. pneumonia expresses a maltoporin similar 

to LamB (WEHM89) which could also be used. In 

35 aeruginosa, the Dl protein (a homologue -of LamB) can be 
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used (TRIA88) . 

LamB of E^. coli is a porin for maltose and malto- 
dextrin transport, and serves as the receptor for adsorp- 
tion of bacteriophages \ and K10. LamB is transported to 
5 the outer membrane if a functional N-terminal sequence is 
present; further, the first 49 amino acids of the mature 
sequence are required for successful transport (BENS84) . 
As with other OSPs, LamB of EU coli is synthesized with a 
typical signal-sequence which is subsequently removed. 

10 Homology between parts of LamB protein and other outer 
membrane proteins OmpC, OmpF, and PhoE has been detected 
(N1KA84) , including homology between LamB amino acids 39- 
49 and sequences of the other proteins. These subse- 
quences may label the proteins for transport to the outer 

15 membrane. 

The amino acid sequence of LamB is known (CLEM81) , 
' and a model has been developed of how it anchors itself to 
the outer membrane (Reviewed by, among others, BENZ88b) . 
The location of its maltose and phage binding domains are 
20 also known (HEIN88) . Using this information, one may 
identify several strategies by which a PBD insert may be 
incorporated into LamB to provide a chimeric 0SP -which 
displays the PBD on the bacterial outer membrane. 

When the PBDs are to be displayed by a chimeric 
25 transmembrane protein like LamB, the PBD could be inserted 
into a loop normally found on the surface of the cell ( cp. 
BECK83, MAN086) . Alternatively, we may fuse a 5 1 segment 
of the osp gene to the ipbd gene fragment; the point of 
fusion is picked to correspond to a surface-exposed loop 
30 of the OSP and the carboxy terminal portions of the OSP 
are omitted. In LamB, it has been found that up to 60 
amino acids may be inserted (CHAR88b) with display of the 
foreign epitope resulting? the structural features of 
OmpC, OmpA, OmpF, and PhoE are so similar that one expects 
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similar behavior from these proteins. 

It should be noted that while LamB may be charac- 
terized as a binding protein, it is used in the present 
invention to provide an OSTS; its binding domains are not 
5 variegated. 

Other bacterial outer surface proteins, such as OmpA, 
OmpC, OmpF, PhoE, and pilin, may be used in place of LamB 
and its homologues. OmpA is of particular interest 
because it is very abundant and because homologues are 

10 known in a wide variety of gram-negative bacterial 
species. Baker et al. (BAKE87) review assembly of 
proteins into the outer membrane of E^ coli and cite a 
topological model of OmpA (VOGE86) that predicts that 
residues 19-32, 62-73, 105-118, and 147-158 are exposed on 

15 the cell surface • Insertion of a ipbd encoding fragment 
at about ccdon 111 or at about jodon 152 is likely to 
cause the IPBD to be displayed on the ceir surface. 
Concerning OmpA, see also MACI88 and MANO88. Porin 
Protein F of Pseudomonas aeruginosa has been cloned and 

20 has sequence homology to OmpA of E± coli (DUCH88) . 
Although this homology is not sufficient to allow predic- 
tion of surface-exposed residues on Porin Protein F, the 
methods used to determine the topological model of OmpA 
may be applied to Porin Protein F. Works related to use 

25 of OmpA as an OSP include BECK80 and MACI88. 

Misra and Benson (MISR88a, MISR88b) disclose a 
topological model of L coli OmpC that predicts that, 
among others, residues GLY 164 and LEU250 are exposed on 
the cell surface. Thus insertion of an ipbd gene fragment 
30 at about codon 164 or at about codon 250 of the E^_ coli 
ompC gene or at corresponding codons of the SU. typhimurium 
ompC gene is likely to cause IPBD to appear on the cell 
surface. The ompC genes of other bacterial species may be 
used. Other works related to OmpC include CATR87 and 
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CLIC88, 

OmpF of E_5_ coli is a very abundant OSP, >10 4 copies- 
/cell. Pages et al. (PAGE90) have published a model of 
OmpF indicating seven surface-exposed segments. Fusion of 
5 an ipbd gene fragment, either as an insert or to replace 
the 3 1 part of ompF , in one of the indicated regions is 
likely to produce a functional ompF : : ipbd gene the 
expression of which leads to display of IPBD on the cell 
surface. In particular, fusion at about codon 111, 177, 
10 217, or 245 should lead to a functional ompF: :ipbd gene. 
Concerning OmpF, see also REID88b, PAGE 8 8 , BENS 88 , TOMM82, 
and SODE85. 

Pilus proteins are of particular interest because 
piliated cells express many copies of these proteins and 

15 because several species (!£•. gonorrhoeae , z±. aeruginosa, 
Moraxell a bovis, Bacteroides nodosus , and E^_ coli ) express 
related pilins. Getzoff and coworkers (GETZ88, PARG87, 
SOME85) have constructed a model of the gonococcal pilus 
that predicts that the protein forms a four-helix bundle 

20 having structural similarities to tobacco mosaic virus 
protein and myohemerythrin. On this model, both the amino 
and carboxy termini of the protein are e_xposed. _ The amino 
terminus is methylated. Elleman (ELLE88) has reviewed 
pilins of Bacteroides nodosus and other species and 

25 serotype differences can be related to differences in the 
pilin protein and that most variation occurs in the C- 
terminal region. The amino-terminal portions of the pilin 
protein are highly conserved. Jennings et al. (JENN89) 
have grafted a fragment of foot-and-mouth disease virus 

3 0 (residues 144-159) into the EL. nodosus type 4 fimbrial 
protein which is highly homologous to gonococcal pilin. 
They found that expression of the 3' -terminal fusion in P^_ 
aeruginosa led to a viable strain that makes detectable 
amounts of the fusion protein* Jennings et al* did not 

35 vary the foreign epitope nor did they suggest any varia- 
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tion. They inserted a GLY-GLY linker between the last 
pilin residue and the first residue of the foreign epitope 
to provide a "flexible linker". Thus a preferred place to 
attach an IPBD is the carboxy terminus. The exposed loops 
5 of the bundle could also be used, although the particular 
internal fusions tested by Jennings et al. (JENNS9) 
appeared to be lethal in L. aeruginosa . Concerning pilin, 
see also MCKE85 and 0RND85. 

Judd (JUDD86, JUDD85) has investigated Protein IA of 
10 gonorrhoeae and found that the amino terminus is 

exposed; thus, one could attach an IPBD at or near the 
amino terminus of the mature P.IA as a means to display 
the IPBD on the N_l. gonorrhoeae surface. 

A model of the topology of PhoE of E_;_ coli has been 
15 disclosed by van der Ley et al. (VAND86) . This model 
predicts eight loops that are exposed; insertion of an 
IPBD into one of these loops is likely to lead to display 
of the IPBD on the surface of the cell. Residues 158, 
201, 238, and 275 are preferred locations for insertion of 
20 and IPBD. 

Other OSPs that could be used include E-. coli BtuB, 
FepA, FhuA, IutA, FecA, and FhuE (GUDM89) which are 
receptors for nutrients usually found in low abundance. 
The genes of all these proteins have been sequenced, but 

25 topological models are not yet available. Gudmunsdottir 
et al. (GUDM89) have begun the construction of such a 
model for BtuB and FepA by showing that certain residues 
of BtuB face the periplasm and by determining the func- 
tionality of various BtuB:: FepA fusions. Carmel et al. 

30 (CARM90) have reported work of a similar nature for FhuA. 
All Neisseria species express outer surface proteins for 
iron transport that have been identified and, in many 
cases, cloned. See also MORS87 and MORS88. 

Many gram-negative bacteria express one or more 
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phospholipases, E^. coli phospholipase A, product of the 
pldA gene, has been cloned and sequenced by de Geus et al. 
(DEGE84) . They found that the protein appears at the cell 
surface without any posttranslational processing. A ipbd 
5 gene fragment can be attached at either terminus or 
inserted at positions predicted to encode loops in the 
protein. That phospholipase A arrives on the outer 
surface without removal of a signal sequence does not 
prove that a PldA:: IPBD fusion protein will also follow 

10 this route. Thus we might cause a PldA: : IPBD or IPBD:- 
:PldA fusion to be secreted into the periplasm by addition 
of an appropriate signal sequence. Thus, in addition to 
simple binary fusion of an ipbd fragment to one terminus 
of pldA , the constructions: 

15 1) ss: : ipbd : : pldA 
2) ss: : p!dA : : ipbd 
should be tested. Once the PldA:: IPBD protein is free in 
the periplasm it does not remember how it got theie and 
the structural features of PldA that cause it to localize 

20 on the outer surface will direct the fusion to the same 
destination. 

IV. D. Bacterial Snores as Genetic Packages: _ 

Bacterial spores have desirable properties as GP 
candidates. Spores are much more resistant than vegeta- 

25 tive bacterial cells or phage to chemical and physical 
agents, and hence permit the use of a great variety of 
affinity selection conditions. Also, Bacillus spores 
neither actively metabolize nor alter the proteins on 
their surface. Spores have the disadvantage that the 

30 molecular mechanisms that trigger sporulation are less 
well worked out than is the formation of M13 or the export 
of protein to the outer membrane of L. coli . 

Bacteria of the genus Bacillus form endospores that 
are extremely resistant to damage by heat, radiation, 
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desiccation, and toxic chemicals (reviewed by Losick et 
al. (LOSI86)). This phenomenon is attributed to extensive 
intermolecular crosslinking of the coat proteins. 
Endospores from the genus Bacillus are more stable than 
5 are exospores from Streptomvces . Bacillus subtilis forms 
spores in 4 to 6 hours, but Streptomvces species may 
require days or weeks to sporulate. In addition, genetic 
knowledge and manipulation is much more developed for B^ 
subtilis than for other spore-forming bacteria. Thus 
10 Bacillus spores are preferred over Streptomvces spores. 
Bacteria of the genus Clostridium also form very durable 
endospores, but Clostridia, being strict anaerobes, are 
not convenient to culture. 

& Viable spores that differ only slightly from wild- 

^ 15 type are produced in |L_ subtilis even xf any one of four 

hj coat proteins is missing (D0NC87) . Moreover, plasmid DNA 

03 is commonly included in spores, and plasmid encoded 

^ proteins have been observed on the surface of Bacillus 

~" spores (DEBR86) ♦ For these reasons, we expect that it 

0 20 will be possible to express during sporulation a gene 

encoding a chimeric coat protein, without interfering 
"[ft materially with spore formation. 

M Donovan et al. have identified several polypeptide 

r ~ components of IL_ subtilis spore coat (DON087) ; the 

25 sequences of two complete coat proteins and amino-terminal 
fragments of two others have been determined. Some, but 
not all, of the coat proteins are synthesized as precur- 
sors and are then processed by specific proteases before 
deposition in the spore coat (DON087) . The 12kd coat 
3 0 protein, CotD, contains 5 cysteines. CotD also contains 
an unusually high number of histidines (16) and prolines 
(7) . The llkd coat protein, CotC, contains only one 
cysteine and one methionine. CotC has a very unusual 
amino-acid sequence with 19 lysines (K) appearing as 9 K-K 
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dipeptides and one isolated K. There are also 20 tyro- 
sines (Y) of which 10 appear as 5 Y-Y dipeptides. 
Peptides rich in Y and K are known to become crossl inked 
in oxidizing environments (DEV078, WAIT83, WAIT85, 
5 WAIT86) . CotC contains 16 D and E amino acids that nearly 
equals the 19 Ks. There are no A, F, R, I, L, N, P, Q, S, 
or W amino acids in CotC. Neither CotC nor CotD is post- 
translationally cleaved, but the proteins CotA and CotB 
are. 

10 Since, in subtilis, some of the spore coat 

proteins are post-translationally processed by specific 
proteases, it is valuable to know the sequences of 
precursors and mature coat proteins so that we can avoid 
incorporating the recognition sequence of the specific 

15 protease into our construction of an OSP-IPBD fusion. The 
sequence of a mature spore coat protein contains informa- 
tion that causes the protein to be deposited in the spore 
coat; thus gene fusions that include some or all of a 
mature coat protein sequence are preferred for screening 

20 or selection for the display-of-IPBD phenotype. 

Fusions of iobd fragments to cotC or cotD fragments 
are likely to cause IPBD to appear on the spore surface. 
The genes cotC and cotD are preferred osp genes because 
CotC and CotD are not post-translationally cleaved. 

2 5 Subsequences from cotA or cotB could also be used to cause 
an IPBD to appear on the surface of B^ subtilis spores, 
but we must take the post-translational cleavage of these 
proteins into account. DNA encoding IPBD could be fused 
to a fragment of cotA or cotB at either end of the coding 

30 region or at sites interior to the coding region. Spores 
could then be screened or selected for the display-of-IPBD 
phenotype . 

The promoter of a spore coat protein is most active: 
a) when spore coat protein is being synthesized and 
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deposited onto the spore and b) in the specific place that 
spore coat proteins are being made. The sequences of 
several sporulation promoters are known; coding sequences 
operatively linked to such promoters are expressed only 
during sporulation . Ray et al. (PAYC87) have shown that 
the G4 promoter of subtil is is directly controlled by 
RNA polymerase bound to a E . To date, no Bacillus sporula- 
tion promoter has been shown to be inducible by an 
exogenous chemical inducer as the lac promoter of coli . 
Nevertheless, the quantity of protein produced from a 
sporulation promoter can be controlled by other factors, 
such as the DNA sequence around the Shine-Dalgarno 
sequence or codon usage. Chemically inducible sporulation 
promoters can be developed if necessary, 

IV. E. Artificial QSPs 

It is generally preferable to use as the genetic 
package a cell, spore or virus for which 'in outer surface 
protein which can be engineered to display a IPBD has 
already been identified. However, the present invention 
is not limited to such genetic packages. 

It is believed that the conditions for an outer 
surface transport signal in a bacterial cell or spore are 
not particularly stringent, i.e. , a random polypeptide of 
appropriate length (preferably 30-100 amino acids) has a 
reasonable chance of providing such a signal. Thus, by 
constructing a chimeric gene comprising a segment encoding 
the IPBD linked to a segment of random or pseudorandom DNA 
(the potential OSTS) , and placing this gene under control 
of a suitable promoter, there is a possibility that the 
chimeric protein so encoded will function as an OSP-IPBD. 

This possibility is greatly enhanced by constructing 
numerous such genes, each having a different potential 
OSTS, cloning them into a suitable host, and selecting for 
transformants bearing the IPBD (or other marker) on their 
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outer surface. Use of secretion-permissive mutants, such 
as or!A4 (LISS85) or prlG (STAD89) , can increase the 
probability of obtaining a working OSP-IPBD. 

When seeking to display a IPBD on the surface of a 
5 bacterial cell, as an alternative to choosing a natural 
OSP and an insertion site in the OSP, we can construct a 
gene (the "display probe") comprising: a) a regulatable 
promoter ( e.g. lacUV5) , b) a Shine-Dalgarno sequence, c) a 
periplasmic transport signal sequence, d) a fusion of the 
10 ipbd gene with a segment of random DNA (as in Kaiser et 
al. (KAIS87)), e) a stop codon, and f) a transcriptional 
terminator, 

y§ When the genetic package is a spore, we can use the 

03 approach described above for attaching a IPBD to an 

*f 15 coli cell, except that: a) a sporulation promoter is used, 

Si and b) no periplasmic signal sequence should be present 

^ For phage, because the OSP-IPBD fulfills a structural 

J" role in the phage coat, it is unlikely that any particular 

O random DNA sequence coupled to the ipbd gene will produce 

20 a fusion protein that fits into the coat in a functional 
way. Nevertheless, random DNA inserted between large 
□ fragments of a coat protein gene and the pbd gene will 

H produce a population that is likely to contain one or more 

members that display the IPBD on the outside of a viable 
25 phage. 

As previously stated, the purpose of the random DNA 
is to encode an OSTS, like that embodied in known OSPs. 
The fusion of ipbd and the random DNA could be in either 
order, but ipbd upstream is slightly preferred. Isolates 
30 from the population generated in this way can be screened 
for display of the IPBD. Preferably, a version of 
selection-through-binding is used to select GPs that 
display IPBD on the GP surface. Alternatively, clonal 
isolates of GPs may be screened for the display-of-IPBD 
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phenotype . 

The preference for iobd upstream of the random DNA 
arises from consideration of the manner in which the 
successful GP(IPBD) will be used. The present invention 
5 contemplates introducing numerous mutations into the pbd 
region of the osp-pbd gene, which, depending on the 
variegation scheme, might include gratuitous stop codons. 
If pbd precedes the random DNA, then gratuitous stop 
codons in pbd lead to no OSP-PBD protein appearing on the 
10 cell surface. If pbd follows the random DNA, then 
gratuitous stop codons in pbd might lead to incomplete 
OSP-PBD proteins appearing on the cell surface. Incom- 
0 plete proteins often are non-specif ically sticky so that 

GPs displaying incomplete PBDs are easily removed from the 
y§ 15 population. 

!ji The random DNA may b^ obtained in a variety of ways. 

Tj Degenerate synthetic DNA is one possibility. Alternative- 

51 ly, pseudorandom DNA can be generated from any DNA having 

^ high sequence diversity, e.g. , the genome of the organism, 

20 by partially digesting with an enzyme that cuts very 
jpy often, e.g. , Sau3A I. Alternatively, one could shear DNA 

C* having high sequence diversity, blunt the sheared DNA with 

J*f the large fragment of coli DNA polymerase I (herein- 

after referred to as Klenow fragment) , and clone the 

25 sheared and blunted DNA into blunt sites of the vector 
(MANI82, p295, AUSU87) . 

If random DNA and phenotypic selection or screening 
are used to obtain a GP(IPBD) , then we clone random DNA 
into one of the restriction sites that was designed into 
30 the display probe. A plasmid . carrying the display probe 
is digested with the appropriate restriction enzyme and 
the fragmented, random DNA is annealed and ligated by 
standard methods. The ligated plasmids are used to 
transform cells that are grown and selected for expression 
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of the antibiotic-resistance gene. Plasmid-bearing GPs 
are then selected for the display-of-IPBD phenotype by the 
affinity selection methods described hereafter, using 
AfM(IPBD) as if it were the target. 

5 As an alternative to selecting GP(IPBD)s through 

binding to an affinity column, we can isolate colonies or 
plaques and screen for successful artificial OSPs through 
use of one of the methods listed below for verification of 
the display strategy. 

10 IV. F Designing the osp-ipbd gene insert: 

Genetic Construction and Expression Considerations 

The f ilpbd-osp gene may be: a) completely synthetic, 
b) a composite of natural and synthetic DNA, or c) a 
composite of natural DNA fragments. The important point 

15 is that the pbd segment be easily variegated so as to 
encode a wni titudinous and diverse family of PBDs as 
previously described. A synthetic ipbd segment is 
preferred because it allows greatest control over place- 
ment of restriction sites. Primers complementary to 

20 regions abutting the osp-ipbd gene on its 3' flank and to 
parts of the osp-ipbd gene that are not to be varied are 
needed for sequencing. 

The sequences of regulatory parts of the gene are 
taken from the sequences of natural regulatory elements: 

25 a) promoters, b) Shine-Dalgarno sequences, and c) trans- 
criptional terminators. Regulatory elements could also be 
designed from knowledge of consensus sequences of natural 
regulatory regions. The sequences of these regulatory 
elements are connected to the coding regions; restriction 

30 sites are also inserted in or adjacent to the regulatory 
regions to allow convenient manipulation. 

The essential function of the affinity separation is 
to separate GPs that bear PBDs (derived from IPBD) having 
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high affinity for the target from GPs bearing PBDs having 
low affinity for the target. If the elution volume of a 
GP depends on the number of PBDs on the GP surface, then a 
GP bearing many PBDs with low affinity, GPfPBE^), might 
co-elute with a GP bearing fewer PBDs with high affinity, 
GP(PBD S ). Regulation of the bs p-pbd gene preferably is 
such that most packages display sufficient PBD to effect a 
good separation according to affinity. Use of a regulat- 
able promoter to control the level of expression of the 
osp-pbd allows fine adjustment of the chromatographic 
behavior of the variegated population. 

Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through the 
use of regulated promoters such as lacUVS , trpP , or tac 
(MANI82). The factors that regulate the quantity of 
protein synthesized include: a) promoter strength (cf . 
HOOP87) , b) rate of initiation of translation (cf.- 
GOLD87) , c) codon usage, d) secondary structure of mRNA, 
including attenuators ( cf . LAND87) and terminators (cf . 
YAGE87) , e) interaction of proteins with mRNA ( cf . MCPH86, 
MILL87b, WINT87), f) degradation rates of mRNA (cf . 
BRAW87, KING86), g) proteolysis (cf^- G0TT87) . These 
factors are sufficiently well understood that a wide 
variety of heterologous proteins can now be produced in L. 
coli, Bj_ subtil is and other host cells in at least 
moderate quantities (SKER88, BETT88) . Preferably, the 
promoter for the osp-ipbd gene is subject to regulation by 
a small chemical inducer. For example, the lac promoter 
and the hybrid trp - lac ( tac ) promoter are regulatable with 
isopropyl thiogalactoside (IPTG) . Hereinafter, we use 
"XINDUCE" as a generic term for a chemical that induces 
expression of a gene. The promoter for the constructed 
gene need not come from a natural osp gene; any regulat- 
able bacterial promoter can be used. 
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Transcriptional regulation of gene expression is best 
understood and most effective, so we focus our attention 
on the promoter. If transcription of the osp-ipbd gene is 
controlled by the chemical XINDUCE, then the number of 
5 OSP-IPBDs per GP increases for increasing concentrations 
of XINDUCE until a fall-off in the number of viable 
packages is observed or until sufficient IPBD is observed 
on the surface of harvested GP(IPBD)s. The attributes 
that affect the maximum number of OSP-IPBDs per GP are 

10 primarily structural in nature. There may be steric 
hindrance or other unwanted interactions between IPBDs if 
OSP-IPBD is substituted for every wild- type OSP. Exces- 
sive levels of OSP-IPBD may also adversely affect the 
solubility or morphogenesis of the GP. For cellular and 

15 viral GPs, as few as five copies of a protein having 
affinity for another immobilized molecule have' resulted in 
successful affinity separations (FERE82a, FERE82b / and 
SMIT85) . 

A non-leaky promoter is preferred. Non-leakiness is 
20 useful: a) to show that affinity of GP( osp-ipbd ) s for 
AfM(IPBD) is due to the osp-ipbd gene, and b) to allow 
growth of GP ( osp-ipbd ) in the absence of XINDUCE if the 
expression of osp-ipbd is disadvantageous. The lacUV5 
promoter in conjunction with the Lad? repressor is a 
25 preferred example. 

An exemplary osp-ipbd gene has the DNA sequence shown 
in Table 25 and there annotated to explain the useful 
restriction sites and biologically important features, 
viz. the lacUV5 promoter, the lacO operator, the Shine- 
3 0 Dalgarno sequence, the amino acid sequence, the stop 
codons, and the trp attenuator transcriptional terminator. 

The present invention is not limited to a single 
method of gene design. The osp-ipbd gene need not be 
synthesized in toto; parts of the gene may be obtained 
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from nature* One may use any genetic engineering method 
to produce the correct gene fusion, so long as one can 
easily and accurately direct mutations to specific sites 
in the pbd DNA subsequence. In all of the methods of 
5 mutagenesis considered in the present invention, however, 
it is necessary that the coding sequence for the osp-ipbd 
gene be different from any other DNA in the OCV. The 
degree and nature of difference needed is determined by 
the method of mutagenesis to be used* If the method of 
10 mutagenesis is to be replacement of subsequences coding 
for the PBD with vgDNA, then the subsequences to be 
mutagenized are preferably bounded by restriction sites 
0 that are unique with respect to the rest of the OCV. Use 

Jf of non-unique sites involves partial digestion which is 

S 15 less efficient than complete digestion of a unique site 

yj and is not preferred* If single-stranded-oligonucleotide- 

S| directed mutagenesis is to be used, then the DNA sequence 

3 of the subsequence coding for the IPBD must be unique with 

Hi 

s respect to the rest of the OCV. 

5| 20 The coding portions of genes to be synthesized are 

jpy designed at the protein level and then encoded in DNA* 

€l _ . The amino acid sequences are chosen to achieve various 

goals, including: a) display of a IPBD on the surface of a 
GP, b) change of charge on a IPBD, and c) generation of a 
25 population of PBDs from which to select an SBD* These 
issues are discuss in more detail below* The ambiguity in 
the genetic code is exploited to allow optimal placement 
of restriction sites and to create various distributions 
of amino acids at variegated codons. 

30 While the invention does not require any particular 

number or placement of restriction sites, it is generally 
preferable to engineer restriction sites into the gene to 
facilitate subsequent manipulations. Preferably, the gene 
provides a series of fairly uniformly spaced unique 
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restriction sites with no more than a preset maximum 
number of bases, for example 100, between sites . Prefer- 
ably, the gene is designed so that its insertion into the 
OCV does not destroy the uniqueness of unique restriction 
sites of the OCV. Preferred recognition sites are those 
for restriction enzymes which a) generate cohesive ends, 
b) have unambiguous recognition, or c) have higher 
specific activity. 

The ambiguity of the DNA between the restriction 
sites is resolved from the following considerations* If 
J:he given amino acid sequence occurs in the recipient 
organism, and if the DNA sequence of the gene in the 
organism is known, then, preferably, we maximize the 
differences between the engineered and natural genes to 
minimize the potential for recombination* In addition, 
the following codons are poorly translated in JL, coli and, 
therefore, are avoided if possible: cta(L), cga (R) , egg 
(R) , and agg (R) . For other host species, different codon 
restrictions would be appropriate. Finally, long repeats 
of any one base are prone .to mutation and thus are 
avoided. Balancing these considerations, we can design a 
_ DNA sequence. 
Structural Considerations 

The design of the amino-acid sequence for the ipbd- 
osp gene to encode involves a number of structural 
considerations. The design is somewhat different for 
each type of GP. In bacteria, OSPs are not essential, so 
there is no requirement that the OSP domain of a fusion 
have any of its parental functions beyond lodging in the 
outer membrane. 

Relationship between PBD and OSP 

It is not required that the PBD and OSP domains have 
any particular spatial relationship; hence the process of 
this invention does not require use of the method of US 
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Patent '692. 

It is, in fact, desirable that the OSP not constrain 
the orientation of the PBD domain; this is not to be 
confused with lack of constraint within the PBD. Cwirla 
et al. (CWIR90) , Scott and Smith (SCOT90) , and Devlin et 
al. (DEVL90) , have taught that variable residues in phage- 
displayed random peptides should be free of influence from 
the phage OSP. We teach that binding domains having a 
moderate to high degree of conformational constraint will 
exhibit higher specificity and that higher affinity is 
also possible. Thus, we prescribe picking codons for 
variegation that specify amino acids that will appear in a 
well-defined framework. The nature of the side groups is 
varied through a very wide range due to the combinatorial 
replacement of multiple amino acids. The main chain 
conformations of most PBDs of a given class is very 
similar. The movement of the PBD relative to the OSP 
should not, however, be restricted. Thus it is often 
appropriate to include a flexible linker between the PBD 
and the OSP. Such flexible linkers can be taken from 
naturally occurring proteins known to have flexible 
regions. For example, the gill protein of M13 contains 
glycine-rich regions thought to allow the amino-terminal 
domains a high degree of freedom. Such flexible linkers 
may also be designed. Segments of polypeptides that are 
rich in the amino acids GLY, ASN, SER, and ASP are likely 
to give rise to flexibility. Multiple glycines are 
particularly preferred. 

Constraints imposed by OSP 

When we choose to insert the PBD into a surface loop 
of an OSP such as LamB, OmpA, or M13 gill protein, there 
are a few considerations that do not arise when PBD is 
joined to the end of an OSP. In these cases, the OSP 
exerts some constraining influence on the PBD; the ends of 
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the PBD are held in more or less fixed positions • We 
could insert a highly varied DNA sequence into the osp 
gene at codons that encode a surface-exposed loop and 
select for cells that have a specific-binding phenotype. 
When the identified amino-acid sequence is synthesized (by 
any means) , the constraint of the OSP is lost and the 
peptide is likely to have a much lower affinity for the 
target and a much lower specificity. Tan and Kaiser 
(TANN77) found that a synthetic model of BPTI containing 
all the amino acids of BPTI that contact trypsin has a K3 
for trypsin «10 7 higher than BPTI. Thus, it is strongly 
preferred that the varied amino acids be part of a PBD in 
which the structural constrains are supplied by the PBD. 

It is known that the amino acids adjoining foreign 
epitopes inserted into LamB influence the immunological 
properties of these epitopes (VAND90) . We expect that 
PBDs inserted into loops of LamB, OmpA, o>* similar OSPs 
will be influenced by the amino acids of the loop and by 
the OSP in general. To obtain appropriate display of the 
PBD, it may be necessary to add one or more linker amino 
acids between the OSP and the PBD. Such linkers may be 
taken from natural proteins or designed on the basis of 
our knowledge of the structural behavior of amino acids. 
Sequences rich in GLY, SER, ASN, ASP, ARG, and THR are 
appropriate. One to five amino acids at either junction 
are likely to impart the desired degree of flexibility 
between the OSP and the PBD. 
Phage OSP 

A preferred site for insertion of* the ipbd gene into 
the phage osp gene is one in which: a) the IPBD folds into 
its original shape, b) the OSP domains fold into their 
original shapes, and c) there is no interference between 
the two domains. 

If there is a model of the phage that indicates that 
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either the amino or carboxy terminus of an OSP is exposed 
to solvent, then the exposed terminus of that mature OSP 
becomes the prime candidate for insertion of the it>bd 
gene. A low resolution 3D model suffices. 

In the absence of a 3D structure, the amino and 
carboxy termini of the mature OSP are the best candidates 
for insertion of the ipbd gene. A functional fusion may 
require additional residues between the IPBD and OSP 
domains to avoid unwanted interactions between the 
domains* Random-sequence DNA or DNA coding for a specific 
sequence of a -protein homologous to the IPBD or OSP, can 
be inserted between the osp fragment and the ipbd fragment 
if needed. 

Fusion at a domain boundary within the OSP is also a 
good approach for obtaining a functional fusion. Smith 
exploited such a boundary when subcloning heterologous DNA 
into gene III of fl (SMIT85) . 

The criteria for identifying OSP domains suitable for 
causing display of an IPBD are somewhat different from 
those used to identify and IPBD. When identifying an OSP, 
minimal .size is not so important because the OSP domain 
will not appear in the final binding molecule nor will we 
need to synthesize the gene repeatedly in each variegation 
round. The major design concerns are that: a) the 
OSP:;IPBD fusion causes display of IPBD, b) the initial 
genetic construction be reasonably convenient, and c) the 
osp: : iobd gene be genetically stable and easily manipu- 
lated. There are several methods of identifying domains. 
Methods that rely on atomic coordinates have been reviewed 
by Janin and Chothia (JANI85) . These methods use matrices 
of distances between a carbons (C a ) , dividing planes (cf . 
ROSE85) , or buried surface (RASH84) * Chothia and col- 
laborators have correlated the behavior of many natural 
proteins with domain structure (according to their 
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definition) • Rashin correctly predicted the stability of 
a domain comprising residues 206-316 of thermolysin 
(VITA84, RASH84) • 

Many researchers have used partial proteolysis and 
5 protein sequence analysis to isolate and identify stable 
domains. (See, for example, VITA84, POTE83, SCOT87a, and 
PAB079.) Pabo et aL used calorimetry as an indicator 
that the cl repressor from the coliphage X contains two 
domains; they then used partial proteolysis to determine 
10 the location of the domain boundary. 

If the only structural information available is the 
amino acid sequence of the candidate OSP, we can use the 
sequence to predict turns and loops. There is a high 
probability that some of the loops and turns will be 
\| 15 correctly predicted ( cf . Chou and Fasman, (CHOU74)); these 
locations are also candidates for inseit-ion of the ipbd 
gene fragment. 

" Bacterial OSPs 

[ In bacterial OSPs, the major considerations are: a) 

20 that the PBD is displayed, and b) that the chimeric 
protein not be toxic. ~ _ 

From topological models of OSPs, we can determine 
whether the amino or carboxy termini of the OSP is 
exposed. If so, then these are excellent choices for 
25 fusion of the osp fragment to the ipbd fragment. 

The lamB gene has been sequenced and is available on 
a variety of plasmids (CLEM81, CHAR88). Numerous fusions 
of fragments of lamB with a variety of other genes have 
been used to study export of proteins in IjU. coli. From 
30 various studies, Charbit ejb al. (CHAR88) have proposed a 
model that specifies which residues of LamB are: a) 
embedded in the membrane, b) facing the periplasm, and c) 
facing the cell surface; we adopt the numbering of this 
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model for amino acids in the ma.ture protein* According to 
this model, several loops on the outer surface are 
defined, including: 1) residues 88 through 111, 2) 
residues ,145 through 165, and 3) 236 through 251. 

Consider a mini-protein embedded in LamB. For 
example, insertion of DNA encoding G^NXCX5XXXCX 10 SG 12 
between codons 153 and 154 of lamB is likely to lead to a 
wide variety of LamB derivatives being expressed on the 
surface of JL. coli cells. G 1; and are 

supplied to allow* the mini-protein sufficient orientation- 
al freedom that is can interact optimally with the target. 
Using affinity enrichment (involving, for example, FACS 
via a fluorescently labeled target, perhaps through 
several rounds of enrichment) , we might obtain a strain 
(named, for example, BEST) that expresses a particular 
LamD derivative that shows high affinity for the predeter- 
mined target. An octa^eptide having the sequence of the 
inserted residues 3 through 10 from BEST is likely to have 
an affinity and specificity similar to that observed in 
BEST because the octapeptide has an internal structure 
that .keeps the amino acids in a conformation that is quite 
similar in the LamB " derivative and in the isolated mini- 
protein- 

Consideration of the Signal Peptide 

Fusing one or more new domains to a protein may make 
the ability of the new protein to be exported from the 
cell different from the ability of the parental protein. 
The signal peptide of the wild- type coat protein may 
function for authentic polypeptide but be unable to direct 
export of a fusion* To utilize the Sec-dependent pathway, 
one may need a different signal peptide. Thus, to express 
and display a chimeric BPTI/M13 gene VIII protein, we 
found it necessary to utilize a heterologous signal 
peptide (that of phoA ) • 
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Provision of a means to remove PBD from the GP 

GPs that display peptides having high affinity for 
the target may be quite difficult to elute from the 
target, particularly a multivalent target. (Bacteria that 
are bound very tightly can simply multiply in situ . ) For 
phage, one can introduce a cleavage site for a specific 
protease, such as blood-clotting Factor Xa, into the 
fusion OSP protein so that the binding domain can be 
cleaved from the genetic package. Such cleavage has the 
advantage that all resulting phage have identical OSPs and 
therefore are equally infective,- - even if polypeptide- 
displaying phage can be eluted from the affinity matrix 
without cleavage. This step allows recovery of valuable 
genes which might otherwise be lost. To our knowledge, no 
one has uisclos^d or suggested using a specific protease 
as a means 'co recover an information-containing genetic 
package or of converting a population of phage that vary 
in infectivity into phage having identical infectivity. 

IV. G. Synthesis of Gene Inserts * 

The present invention is not limited as to how a 
designed DNA sequence is divided, for easy synthesis. An 
established method is to synthesize both strands of the 
entire gene in overlapping segments of 20 to 50 nucle- 
otides (nts) (THER88). An alternative method that is more 
suitable for synthesis of vgDNA is an adaptation of 
methods published by Oliphant et aJK (0LIP86 and OLIP87) 
and Ausubel et al. (AUSU87) . It differs from previous 
methods in that it: .a) uses two synthetic strands, and b) 
does not cut the extended DNA in the middle. Our goals 
are: a) to produce longer pieces of dsDNA than can be 
synthesized as ssDNA on commercial DNA synthesizers, and 
b) to produce strands complementary to single-stranded 
vgDNA. By using two synthetic strands, we remove the 
requirement for a palindromic sequence at the 3 ! end. 
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DNA synthesizers can currently produce oligo-nts of 
lengths up to 200 nts in reasonable yield, M DNA = 200. 
The parameters N w (the length of overlap needed to obtain 
efficient annealing) and N s (the number of spacer bases 
needed so that a restriction enzyme can cut near the end 
of blunt-ended dsDNA) are determined by DNA and enzyme 
chemistry. N w = 10 and N s = 5 are reasonable values. 
Larger values of N w and N s are allowed but add to the 
length of ssDNA that is to be synthesized and reduce the 
net length of dsDNA that can be produced. 

Let A L be the actual length of dsDNA _ to be syn- 
thesized, including any spacers. A L must be no greater 
than (2 M DNA - N w ) . Let Q w be the number of nts that the 
overlap window can deviate from center, 

Qw ~ (2 M DNA - N w - A L )/2 . 

Q w is never negative. It is preferred that the two 
fragments be approximately *-Jis same length so that the 
amounts synthesized will be approximately equal. This 
preference may be overridden by other considerations. The 
overall yield of, dsDNA is usually: dominated by the 
synthetic yield of the longer oligo-nt. 

We use the following procedure to generate dsDNA of 
lengths up to (2 M DNA - N w ) nts through the use of Klenow 
fragment to extend synthetic ss DNA fragments that are not 
more than M DNA nts long. When a pair of long oligo-nts, 
complementary for N w nts at their 3* ends, are annealed 
there will be a free 3 ' hydroxyl and a long ssDNA chain 
continuing in the 5 1 direction on either side. We will 
refer to this situation as a 5 ? superoverhang . The 
procedure comprises: 

1) picking a non-pal indromic subsequence of N w to N w +4 
nts near the center of the dsDNA to be synthesized; 
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this region is called the overlap (typically, N w is 
10), 

2) synthesizing a ss DNA molecule that comprises that 
part of the anti-sense strand from its 5 f end up to 
and including the overlap, 

3) synthesizing a ss DNA molecule that comprises that 
part of the sense strand from its 5 f end up to and 
including the overlap, 

4) annealing the two synthetic strands that are comple- 
mentary throughout the overlap region, and 

5) extending both superoverhangs with Klenow fragment 
and all four deoxynucleotide triphosphates. 

Because M DNA is not rigidly fixed at 200, the current 
limits of 390 (= 2 M DNA - N w ) nts overall and 200 in each 
fragment are not rigid, but can be exceeded by 5 or 10 
nts. Going beyond the limits of 390 and 200 will lead to 
lower yields, but these may be acc^p^ble in certain 
cases. 

Restriction enzymes do not cut well at sites closer 
than about five base pairs from the end of blunt ds DNA 
fragments (0LIP87 and p. 132 New England BioLabs 1990-1991 
Catalogue). Therefore N s - nts (with N s -typically set-to 5) 
of spacer are added to ends that we intend to cut with a 
restriction enzyme. If the plasmid is to be cut with a 
blunt-cutting enzyme, then we do not add any spacer to the 
corresponding end of the ds DNA fragment. 

To choose the optimum site of overlap for the oligo- 
nt fragments, first consider the anti-sense strand of the 
DNA to be synthesized, including any spacers at the ends, 
written (in upper case) from 5' to 3 1 and left-to-right. 
N.B. : The N w nt long overlap window can never include 
bases that are to be variegated. N.B. : The N w nt long 
overlap should not be palindromic lest single DNA mole- 
cules prime themselves* Place a N w nt long window as 
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close to the center of the anti-sense sequence as poss- 
ible. Check to see whether one or more codons within the 
window can be changed to increase the GC content without: 
a) destroying a needed restriction site, b) changing amino 
acid sequence, or c) making the overlap region palin- 
dromic. If possible, change some AT base pairs to GC 
pairs. If the GC content of the window is less than 50%, 
slide the window right or left as much as Q w nts to 
maximize the number of C's and G ! s inside the window, but 
without including any variegated bases. For each trial 
setting of the overlap window, maximize the GC content by 
silent codon changes, but do not destroy wanted " restric- 
tion sites or make the overlap palindromic. If the best 
setting still has less than 50% GC, enlarge the window to 
N w +2 nts and place it within five nts of the center to 
obtain the maximum GC content. If enlarging the. v/indow 
one ur two nts will increase the GC content, do eo, but do 
not include variegated bs:cas. 

Underscore the anti-sense strand from th* Z* end up 
to the right edge of the window. Write the complementary 
sense sequence G'-to-S 1 and left-to-right and in lower 
case letters, under the anti-sense strand starting _ at the_ 
left edge of the window and continuing all the way to the 
right end of the anti-sense strand. 

We will synthesize the underscored anti-sense strand 
and the part of the sense strand that we wrote. These two 
fragments, complementary over the length of the window of 
high GC content, are mixed in equimolar quantities and 
annealed. These fragments are extended with Klenow 
fragment and all four deoxynucleotide triphosphates to 
produce ds blunt-ended DNA. This DNA can be cut with 
appropriate restriction enzymes to produce the cohesive 
ends needed to ligate the fragment to other DNA. 

The present invention is not limited to any parti- 
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cular method of DNA synthesis or construction. Conven- 
tional DNA synthesizers may be used, with appropriate 
reagent modifications for production of variegated DNA 
(similar to that now used for production of mixed probes) . 
For example, the Milligen 7500 DNA synthesizer has seven 
vials from which phosphoramidites may be taken- Normally, 
the first four contain A, C, T, and G. The other three 
vials may contain unusual bases such as inosine or 
mixtures of bases, the so-called "dirty bottle". The 
standard software allows programmed mixing of two, three, 
or four bases in equimolar quantities. 

The synthesized DNA may be purified by any art 
recognized technique, e.g. , by high-pressure liquid 
chromatography (HPLC) or PAGE. 

The osp-pbd gene s rr-ay be created by inserting vgDNA 
into an existing parental gene, such as the o^p-ipb d shown 
to ije aisplayable by a suitably Lransfontfed GP. The 
present invention not limited to any particular method 
of introducing the vgDNA , however, two techniques are 
discussed below. 

In the case of cassette mutagenesis, the restriction 
sites that were introduced when the gene for the inserted - 
domain was synthesized are used to introduce the synthetic 
vgDNA into a plasmid or other OCV. Restriction digestions 
and ligations are performed by standard methods (AUSU87) * 

In the case of single-stranded-oligonucleotide- 
directed mutagenesis, synthetic vgDNA is used to create 
diversity in the vector (BOTS85) . 

The modes of creating diversity in the population of 
GPs discussed herein are not the only modes possible. Any 
method of mutagenesis that preserves at least a large 
fraction of the information obtained from one selection 
and then introduces other mutations in the same domain 
will work. The limiting factors are the number of 
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independent transformants that can be produced and the 
amount of enrichment one can achieve through affinity- 
separation. Therefore the preferred embodiment uses a 
method of mutagenesis that focuses mutations into those 
residues that are most likely to affect the binding 
properties of the PBD and are least likely to destroy the 
underlying structure of the IPBD. 

Other modes of mutagenesis might allow other ■ GPs to 
be considered. For example, the bacteriophage \ is not a 
useful cloning vehicle for cassette mutagenesis because 
of the plethora of restriction sites. One can, however, 
use single-stranded-oligo-nt-directed mutagenesis on \ 
without the need for unique restriction sites. No one 
has used single-stranded-oligo-nt-directed mutagenesis to 
introduce the high level of diversity called for in the 
present invention, but if it is possible, such a method 
would allow use- of phage with large genomes. 

IV. H. Operative Cloning Vector 

The operative cloning vector (OCV) is a replicable 
nucleic acid used to introduce the chimeric ipbd - osp or 
ipbd - osp gene into the genetic package. When the genetic 
package is a virus, it may serve as its own OCV. For 
cells and spores, the OCV may be a plasmid, a virus, a 
phagemid, or a chromosome. 

The OCV is preferably small (less than 10 KB) , stable 
(even after insertion of at least 1 kb DNA) , present in 
multiple copies within the host cell, and selectable with 
appropriate media. It is desirable that cassette mutagen- 
esis be practical in the OCV; preferably, at least 25 
restriction enzymes are available that do not cut the 
OCV. It is likewise desirable that single-stranded 
mutagenesis be practical. If a suitable OCV does not 
already exist, it may be engineered by manipulation of 
available vectors* 
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When the GP is a bacterial cell or spore, the OCV is 
preferably a plasmid because genes on plasmids are much 
more easily constructed and mutated than are genes in the 
bacterial chromosome. When bacteriophage are to be used, 
the osp-ipbd gene is inserted into the phage genome. The 
synthetic osp-ipbd genes can be constructed in small 
vectors and transferred to the GP genome when complete. 

Phage such as M13 do not confer antibiotic resistance 
on the host so that one can not select for cells infected 
with M13. An antibiotic resistance gene can be engineered 
into the M13 genome (HINE80) . More virulent phage, such 
as $X174, make discernable plaques that can be picked, in 
which case a resistance gene is not essential; further- 
more, there is no room in the $X174 virion to add any new 
genetic material* Inability to include an antibiotic 
resistance gene is a disadvantage because it limits the 
!«t?7ibar of GPs that can be screened. 

It is preferred that GP(IP^n) carry a selectable 
marker not carried by wtGP. , It is also preferred that 
wtGP carry a selectable marker not carried by GP (IPBD) . 

A derivative of M13 is the most preferred OCV when 
the phage also serves as the GP. Wild-type M13 does not 
confer any resistances on infected cells; M13 is a pure 
parasite. A "phagemid" is a hybrid between a phage and a 
plasmid, and is used in this invention. Double-stranded 
plasmid DNA isolated from phagemid-bearing cells is 
denoted by the standard convention, e.g. pXY24. Phage 
prepared from these cells would be designated XY24. 
Phagemids such as Bluescript K/S (sold by Stratagene) are 
not preferred for our purposes because Bluescript does not 
contain the full genome of M13 and must be rescued by 
coinfection with competent wild-type M13. Such coinfec- 
tions could lead to genetic recombination yielding 
heterogeneous phage unsuitable for the purposes of the 
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present invention. Phagemids may be entirely suitable for 
developing a gene that causes an IPBD to appear on the 
surface of phage-like genetic packages. 

It is also well known that plasmids containing the 
ColEl origin of replication can be greatly amplified if 
protein synthesis is halted in a log-phase culture. 
Protein synthesis can be halted by addition of chloram- 
phenicol or other agents (MANI82) . 

The bacteriophage M13 bla 61 (ATCC 37039) is derived 
from wild-type M13 through the insertion of the p lac- 
tamase gene (HINE80) . This phage contains 8.13 kb of DNA. 
M13 bla cat 1 (ATCC 37040) is derived from M13 bla 61 
through the additional insertion of the chloramphenicol 
resistance gene (HINE80) ; M13 bla cat 1 contains 9.88 kb 
of DNA. A! chough neither of these variants of M13 
contains --h^ ColEl origin of replica-Lion, either could be 
Ufc»orl as a starting point to construct cler.iryr vector 
with this feature. 

IV. I. Transformation of cells: 

When the GP is a cell, the population of GPs is 
created by transforming the cells with suitable OCVs. 
When the GP is a phage, the phage are genetically engin- 
eered and then transfected into host cells suitable for 
amplification. When the GP is a spore, cells capable of 
sporulation are transformed with the OCV while in a normal 
metabolic state, and then sporulation is induced so as to 
cause the OSP-PBDs to be displayed. The present invention 
is not limited to any one method of transforming cells 
with DNA. The procedure given in the examples is a 
modification of that of Maniatis (p250, MANI82) . One 
preferably obtains at least 10 7 and more preferably at 
least 10 s trans formants/jug of CCC DNA. 

The transformed cells are grown first under non- 
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selective conditions that allow expression of plasmid 
genes and then selected to kill untrans formed cells. 
Transformed cells are then induced to express the osp-pbd 
gene at the appropriate level of induction. The GPs 
5 carrying the IPBD or PBDs are then harvested by methods 
appropriate to the GP at hand, generally, centrifugation 
to pelletize GPs and resuspension of the pellets in 
sterile medium (cells) or buffer (spores or phage) . They 
are then ready for verification that the display strategy 
10 was successful (where the GPs all display a "test" IPBD) 
or for affinity selection (where the GPs display a variety 
of different PBDs) . 

IV. J. Verification of Display Strategy: 

The harvested packages are tested to determine 

15 whether the IPBD is present on the surface. In any tests 
of OPs lor the presence of IPBD on the GP surface, zay 
ions cr cof actors known to be essential for the stability 
of IPBD or AfM(IPBD) are included at appropriate levels. 
The tests can be done: a) by affinity labeling, h) 

20 enzymatically, c) spectrophotometrically , d) by affinity 
separation, or e) by affinity precipitation. The AfM( IP- 
BD) in this step is one picked to have strong affinity 
(preferably, K d < 10" 11 M) for the IPBD molecule and 
little or no affinity for the wtGP. For example, if BPTI 

25 were the IPBD, trypsin, anhydrotrypsin, or antibodies to 
BPTI could be used as the AfM(BPTI) to test for the 
presence of BPTI. Anhydrotrypsin, a trypsin derivative 
with serine 195 converted to dehydroalanine, has no 
proteolytic activity but retains its affinity for BPTI 

30 (AKOH72 and HUBE77) • 

Preferably, the presence of the IPBD on the surface 
of the GP is demonstrated through the use of a soluble, 
labeled derivative of a AfM(IPBD) with high affinity for 
IPBD. The label could be: a) a radioactive atom such as 
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125 I, b) a chemical entity such as biotin, or 3) a 
fluorescent entity such as rhodamine or fluorescein. The 
labeled derivative of AfM(IPBD) is denoted as AfM(IPBD) ** 
The preferred procedure is: 

1) mix AfM(IPBD) * with GPs that are to be tested for the 
presence of IPBD; conditions of mixing should favor 
binding of IPBD to AfM(IPBD) *, 

2) separate GPs from unbound AfM(IPBD) * by use of: 

a) a molecular sizing filter that will pass 
AfM(IPBD) * but not GPs, 

b) centrifugation, or 

c) a molecular sizing column (such as Sepharose or 
Sephadex) that retains free AfM(IPBD) * but not 
GPs, 

3) quantitate the AfM(IPBD) * bound by GPs. 

Alternatively, if the iPBD has a known biochemical 
?ctiv\>;y (enzvmaf.ic or inhibitory), its presence on the C-P 
can be verified through this activity. For example, if 
the IPBD were BPTI, the-n one could use the stoichiometric 
inactivation of trypsin not only to demonstrate the 
presence of BPTI, but also to quantitate the amount. 

If the IPBD has strong, characteristic absorption 
bands in the visible or UV that are distinct from absorp- 
tion by the wtGP, then another alternative for measuring 
the IPBD displayed on the GP is a spectrophotometric 
measurement. For example, if IPBD were azurin, the 
visible absorption could be used to identify GPs that 
display azurin. 

Another alternative is to label the GPs and measure 
the amount of label retained by immobilized AfM(IPBD) . 
For example, the GPs could be grown with a radioactive 
precursor, such as 32 P or 3 H-thymidine, and the radioac- 
tivity retained by immobilized AfM(IPBD) measured. 
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Another alternative is to use affinity chromato- 
graphy; the ability of a GP bearing the IPBD to bind a 
matrix that supports a AfM(IPBD) is measured by reference 
to the wtGP. 

Another alternative for detecting the presence of 
IPBD on the GP surface is affinity precipitation. 

If random DNA has been used, then affinity selection 
procedures are used to obtain a clonal isolate that has 
the display-of-IPBD phenotype. Alternatively, clonal 
isolates may be screened for the display-of-lPBD pheno- 
type. The tests of this step are applied to one or more 
of these clonal isolates. 

If no isolates that bind to the affinity molecule are 
obtained we take corrective action as disclosed below. 

If one or more of the tests above indicates that the 
IPBD is displayed or. the GP nvrface, we verify that the 
binding of molecules having known affinity for IPBD is due 
to the chimeric osp-ipbd gene through the use of standard 
genetic and biochemical techniques, such as: 

1) transferring the osp-ipbd gene into the parent GP to 
verify that osp-ipbd confers binding, 

2) deleting the osp-ipbd gene from the isolated GP to 
verify that loss of osp-ipbd causes loss of binding, 

3) showing that binding of GPs to AfM(IPBD) correlates 
with [XINDUCE] (in those cases that expression of 
osp-ipbd is controlled by [XINDUCE]), and 

4) showing that binding of GPs to AfM(IPBD) is specific 
to the immobilized AfM(IPBD) and not to the support 
matrix. 

Variation of: a) binding of GPs by soluble AfM(IPBD- 
)*, b) absorption caused by IPBD, and c) biochemical 
reactions of IPBD are linear in the amount of IPBD 
displayed. Presence of IPBD on the GP surface is indi- 
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cated by a strong correlation between [XINDUCE] and the 
reactions that are linear in the amount of IPBD, Leaki- 
ness of the promoter is not likely to present problems of 
high background with assays that are linear in the amount 
5 of IPBD. These experiments may be quicker and easier than 
the genetic tests. Interpreting the effect of [XINDUCE] 
on binding to a (AfM(IPBD) } column, however, may be 
problematic unless the regulated promoter is completely 
repressed in the absence of [XINDUCE] . The affinity 

10 retention of GP(IPBD)s is not linear in the number of 
IPBDs/GP and there may be, for example, little phenotypic 
difference between CPs bearing 5 IPBDs and GPs bearing 50 
IPBDs. The demonstration that binding is to AfM(IPBD) and 
the genetic tests are essential; the tests with XINDUCE 

15 are optional. 

We sequence the relevant ipbd gene fragment from each 
of spheral -vi^l isolates to determine i"he construction. 
We also establish the maximum salt concentration and pH 
range for which the GP(IPBD) binds the chos^ AfM(IPBD) . 
20 This is preferably done by measuring, as a function of 
salt concentration and pH, the retention of AfM(IPBD) * on 
molecular sizing filters that pass AfM(IPBD) * but not GP. 
This information will be used in refining the affinity 
selection scheme. 

25 IV, K. Analysis and Correction of Display Problems 

If the IPBD is displayed on the outside of the GP, 
and if that display is clearly caused by the introduced 
osp-ipbd gene, we proceed with variegation, otherwise we 
analyze the result and adopt appropriate corrective 
30 measures. If we have unsuccessfully attempted to fuse an 
ipbd fragment to a natural osp fragment, our options are 
:1) pick a different fusion to the same osp by a) using 
opposite end of osp , b) keeping more or fewer residues 
from osp in the fusion; for example, in increments of 3 or 
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4 residues, c) trying a known or predicted domain boun- 
dary, d) trying a predicted loop or turn position, 2) pick 
a different osp , or 3) switch to random DNA method. If we 
have just tried the random DNA method unsuccessfully, our 
5 options are: 1) choose a different relationship between 
ipbd fragment and random DNA ( ipbd first, random DNA 
second or vice versa), 2) try a different degree of 
partial digestion, a different enzyme for partial diges- 
tion, a different degree of shearing or a different source 
10 of natural DNA, or 3) switch to the natural OSP method. 
If all reasonable OSPs of the current GP have been tried 
and the random DNA method has been tried, both without 
success, we pick a new GP. 

We may illustrate the ways in which problems may be 
15 attacked by using the example of BPTI as the IPBD, the M13 
phage as the GP, and J .:he major coat (gene VIII) protein as 
the OSP, The roliowing amino-acid i'.-queuce f called 
AA_seg2, illustrates how the sequence for mature BPTI 
(shown under score;:? be inserted immediately after the 

20 signal sequence of Ml 3 preeoat protein (indicated by the 
arrow) and before the sequence for the M13 CP. 

AA_seq2 

25 112M2 3 3 4 4 5 

5 0 5 0 V5 0 5 0 5 0 
MKKSLVLKASVAVATLVPMLSF ARPDFCLEPPYTGPCKARIIRYFYNAKA 

566778899 10 
30 5050505050 
GLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAA EGDDPAKAAFNS LQASAT 

10 11 11 12 12 13 
5 0 5 0 5 0 
3 5 EYIGYAWAMVWIVGATIGIKLFKKFTSKAS 

We adopt the convention that sequence numbers of 
fusion proteins refer to the fusion, as coded, unless 
otherwise noted. Thus the alanine that begins M13 CP is 
40 referred to as "number 82", "number 1 of M13 CP", or 
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"number 59 of the mature BPTI-M13 CP fusion". 

It is desirable to determine where, exactly, the BPTI 
binding domain is being transported: is it remaining in 
the cytoplasm? Is it free within the periplasm? Is it 
attached to the inner membrane? Proteins in the peri- 
plasm can be freed through spheroplast formation using 
lysozyme and EDTA in a concentrated sucrose solution 
(BIRD67, MALA64) . If BPTI were free in the periplasm, it 
would be found in the supernatant* Trypsin labeled with 
125 I would be mixed with supernatant and passed over a 
non-denaturing molecular sizing column and the radioactive 
fractions collected. The radioactive fractions would then 
be analyzed by SDS-PAGE and examined for BPTI-sized bands 
by silver staining. 

Spheroplast formation exposes proteins anchored in 
the inner mwVcane. Spheroplasts would be mixed with 
AHTrp* and then either filterea or omti cc^axatt 
them from unbound AHTrp*. After washing with hypertonic 
buffer, the spheroplasts would be analyzed for extent of 
AHTrp* binding. 

If BPTI were found free in the periplasm, then we 
would expect that the chimeric protein was being cleaved 
both between BPTI and the M13 mature coat sequence and 
between BPTI and the signal sequence. In that case, we 
should alter the BPTI/M13 CP junction by inserting vgDNA 
at codons for residues 78-82 of AA_seq2. 

If BPTI were found attached to the inner membrane, 
then two hypotheses can be formed. The first is that the 
chimeric protein is being cut after the signal sequence, 
but is not being incorporated into LG7 virion; the 
treatment would also be to insert vgDNA between residues 
78 and 82 of AA_seq2. The alternative hypothesis is that 
BPTI could fold and react with trypsin even if signal 
sequence is -not cleaved • N-terminal amino acid sequencing 
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of trypsin-binding material isolated from cell homogenate 
determines what processing is occurring. If signal 
sequence were being cleaved, we would use the procedure 
above to vary residues between C78 and A82; subsequent 
passes would add residues after residue 81. If signal 
sequence were not being cleaved, we would vary residues 
between 23 and 27 of AA_seq2. Subsequent passes through 
that process would add residues after 23. 

If BPTI were found neither in the periplasm nor on 
the inner membrane, then we would expect that the fault 
was in the signal sequence or the signal-sequence-to-BPTI 
junction. The treatment in this case would be to vary 
residues between 23 and 27. 

Analytical experiments to determine what has gone 
wrong take time and effort and, for the foreseen outcomes, 
indicate variations in only two regions. Therefore, we 
Ltr-ll^vG it pr-Jdszit to try t:he synthetic experiments 
described below without doing the analysis. For example, 
these six experiments that introduce variegation into the 
boti-gene VIII fusion could be tried: 

1) 3 variegated codons between residues 78 and 82 using 
"olig#l2 and olig#13," 

2) 3 variegated codons between residues 23 and 27 using 
olig#14 and olig#15, 

3) 5 variegated codons between residues 78 and 82 using 
olig#13 and olig#12a, 

4) 5 variegated codons between residues 23 and 27 using 
olig#15 and olig#14a, 

5) 7 variegated codons between residues 78 and 82 using 
olig#13 and olig#12b, and 

6) 7 variegated codons between residues 23 and 27 using 
olig#15 and olig#14b. 

To alter the BPTI-M13 CP junction, we introduce DNA 
variegated at codons- for residues between 78 and 82 into 
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the SphI and Sfil sites of pLG7. The residues after the 
last cysteine are highly variable in amino acid sequences 
homologous to BPTI, both in composition and length; in 
Table 25 these residues are denoted as G79, G80, and A81. 
The first part of the M13 CP is denoted as A82, E83, and 
G84. One of the oligo-nts olig#12, olig#12a, or olig#12b 
and the primer olig#13 are synthesized by standard 
methods* The oligo-nts are: 

residue 75 76 77 78 79 80 81 82 83 
5 1 gc | gag | CGC | ATG | CGT | ACC I TGC I qf k I qf k | qf k [ GCT | GAA | - 

84 85 86 87 88 89 90 91 
GGT | GAT] GAT | CCG | GCC | AAA | GCG | GCC | gcg j cc 3' olig#12 

residue 75 76 77 78 79 80 81 81a 81b 
5 1 gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | qf k | qf k | - 

£2 83 84 85 86 87 
GCT [ GAA | GGT | GAT | GAT | CCG i - 

08 89 3C 91 
GCC | AAA | GCG j GCC | gcg | cc 3* olig#12a 

residue 75 76 77 78 79 80 81 81a 81b 
5 • gc | gag j cGC j ATG j CGT | ACC | TGC | qf k | qf k [ qf k | qf k | qf k | - 

81c 81d 82 83 84 85 86 87 
- qfk (-qfk | GCT | GAA | GGT | GAT | GAT | CCG | - 

88 89 90 91 
GCC | AAA j GCG | GCC j gcg | cc 3 ! olig#12b 

residue 91 90 89 88 87 86 
5' gg | cgc | GGC | CGC iTTT | GGC | CGG | ATC 3 1 olig#13 

where q is a mixture of (0.26 T, 0.18C, 0.26 A, and 0,30 
G) , f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 
G) f and k is a mixture of equal parts of T and G. The 
bases shown in lower case at either end are spacers and 
are not incorporated into the cloned gene. The primer is 
complementary to the 3 1 end of each of the longer oligo- 
nts. One of the variegated oligo-nts and the primer 
olig#13 are combined in equimolar amounts and annealed. 
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The dsDNA is completed with all four (nt)TPs and Klenow 
fragment. The resulting dsDNA and RF pLG7 are cut with 
both Sfil and Sph I, purified, mixed, and ligated. We then 
select a transformed clone that, when induced with IPTG, 
5 binds AHTrp. 

To vary the junction between M13 signal sequence and 
BPTI, we introduce DNA variegated at codons for residues 
between 23 and 27 into the Kpn l and Xho l sites of pLG7. 
The first three residues are highly variable in amino acid 
10 sequences homologous to BPTI. Homologous sequences also 
vary in length at the amino terminus. One of the oligo- 
nts olig#14, olig#14a, or olig#14b and the primer olig#15 
are synthesized by standard methods. The oligo-nts are: 



15 



25 



35 



residue : 17 18 19 20 21 22 23 24 25 

5 » g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | f xk | f xk | 



26 27 28 2S 30 
20 |fxk|TTC|TGT|CTC|GAG|cgc|ccg|cga| J 1 olig#14 



residue 17 18 19 20 21 22 23 24 25 26 
5 f g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | f xk | f xk | f xk | 

26a 26b 27 28 29 30 

|fxk|fxklTTC|TGT|CTC|GAG|.cgc|ccg|cga| 3 1 olig#14a f 



30 residue 17 18 19 20 21 22 23 24 25 26 
5 1 g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | f xk | f xk | f xk | - 



26a 26b 26c 26d 27 28 29 30 

f xk | f xk | f xk | f xk | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3 1 ol ig# 14b 
>' | teg | egg | gcg | CTC | GAG | ACA | GAA | 3 1 olig#15 



40 where f is a mixture of (0.26 T, 0.18 C, 0.26 A, and 0.30 
G) , x is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 
G) , and k is a mixture of equal parts of T and G. The 
bases shown in lower case at either end are spacers and 
are not incorporated into the cloned gene. One of the 
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variegated oligo-nts and the primer are combined in 
equimolar amounts and annealed* The ds DNA is completed 
with all four (nt)TPs and Klenow fragment. The resulting 
dsDNA and RF pLG7 are cut with both Kpn l and Xho l , 
purified, mixed, and ligated. We select a transformed 
clone that, when induced with IPTG, binds AHTrp or trp. 

Other numbers of variegated codons could be used. 

If none of these approaches produces a working 
chimeric protein, we may try a different signal sequence. 
If that -doesn't work, we may try a different OSP. 

V. AFFINITY SEIiECTION OF TARGET— BINDING MUTANTS 

V.A. Affinity Separation Technology, Generally 

Affinity separation is used initially in the present 
invention to verify that the display system is working, 
I.e. , that a chimeric outer surface protein has been 
expressed and transported to the surface of the genetic 
package and is oriented so that the inserted binding 
domain is accessible to target material. vftien used for 
this purpose, the binding domain is a known binding domain 
for a particular target and that target is the affinity 
molecule used an- the affinity separation process. For 
example, a display system may be validated by using 
inserting DNA encoding BPTI into a gene encoding an outer 
surface protein of the genetic package of interest, and 
testing for binding to anhydrotrypsin, which is normally 
bound by BPTI. 

If the genetic packages bind to the target, then we 
have confirmation that the corresponding binding domain is 
indeed displayed by the genetic package. Packages which 
display the binding domain (and thereby bind the target) 
are separated from those which do not. 

Once the display system is validated, it is possible 
to use a variegated population of genetic packages which 
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display a variety of different potential binding domains, 
and use affinity separation technology to determine how 
well they bind to one or more targets. This target need 
not be one bound by a known binding domain which is 
parental to the displayed binding domains, i.e. , one may 
select for binding to a new target. 

For example, one may variegate a BPTI binding domain 
and test for binding, not to trypsin, but to another 
serine protease, such as human neutrophil diastase or 
cathepsin G, or even to a wholly unrelated target, such as 
horse heart myoglobin. 

The term "affinity separation means" includes, but is 
not limited to: a) affinity column chromatography, b) 
batch elution from an affinity matrix material, c) batch 
elution from an affinity material attached to a plate, d) 
fluorescence activated cell sorting, and e) electrophor- 
esis in the presence of target material. "Affinity 
material" is used to mean a material with affinity for the 
material to be purified, called the "analyte". In most 
cases, the association of the affinity material and the 
arialyte is reversible so that the analyte can be freed 
from the affinity -material once -the impurities are -washed 
away. 

The procedures described in sections V.H, V.I and V.J 
are not required for practicing the present invention, but 
may facilitate the development of novel binding proteins 
thereby . 

V.B. Affinity Chromatography, Generally 

Affinity column chromatography, batch elution from an 
affinity matrix material held in some container, and batch 
elution from a plate are very similar and hereinafter will 
be treated under "affinity chromatography." 

If affinity chromatography is to be used, then: 
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1) the " molecules of the target material must be of 
sufficient size and chemical reactivity to be applied 
to a solid support .suitable for affinity separation, 

2) after application to a matrix, the target material 
preferably does not react with water, 

3) after application to a matrix, the target material 
preferably does not bind or degrade proteins in a 
non-specific way, and 

4) the molecules of the target material must be suffi- 
ciently large that attaching the material to a 
matrix allows enough "unaltered surface area (gener- 
ally at least 500 A 2 / "excluding the atom that is 
connected to the linker) for protein binding. 

Affinity chromatography is the preferred separation 
means, but FACS, electrophoresis, or other means may also 
be used. 

V.C. Fluorescent-Activated Cell Sorting, Generally 

Fluorescent-activated cell sorting involves use of an 
affinity material that is fluorescent per se or is labeled 
with a fluorescent molecule. Current commercially 
available cell sorters require 800 to 1000 molecules of 
fluorescent dye, such as Texas red, bound to each cell. 
FACS can sort 10 3 cells or viruses/ sec. 

FACS ( e.g. FACStar from Beckton-Dickinson, Mountain 
View, CA) is most appropriate for bacterial cells and 
spores because the sensitivity of the machines requires 
approximately 1000 molecules of fluorescent label bound to 
each GP to accomplish a separation. OSPs such as OmpA, 
OmpF, OmpC are present at >10 4 /cell, often as much as 
10 5 /cell. Thus use of FACS with PBDs displayed on one of 
the OSPs of a bacterial cell is attractive. This is 
particularly true if the target is quite small so that 
attachment to a matrix has a much greater effect than 
would attachment to a dye- To optimize FACS separation of 
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GPs, we use a derivative of Afm(IPBD) that is labeled with 
a fluorescent molecule, denoted Afm(IPBD)*. The vari- 
ables to be optimized include: a) amount of IPBD/GP, b) 
concentration of Afm(IPBD)*, c) ionic strength, d) 
5 concentration of GPs, and e) parameters pertaining to 
operation of the FACS machine* Because Afm(IPBD) * and GPs 
interact in solution, the binding will be linear in both 
[Afm(IPBD) *] and [displayed IPBD] . Preferably, these two 
parameters are varied together. The other parameters can 
10 be optimized independently* 

If FACS is to be used as the affinity separation 
means, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 

15 conjugated to a suitable fluorescent dye or the 

target must itself be fluorescent, 

2) after any necessary fluorescent labeling, the target 
preferably does not react with water, 

3) after any necessary fluorescent labeling, the target 
20 material preferably does not bind or degrade proteins 

in a non-specific way, and 

4) the molecules of the target material must be suffi- 
ciently large that attaching the material to a 
suitable dye allows enough unaltered surface area 

25 (generally at least 500 A 2 , excluding the atom that 

is connected to the linker) for protein binding. 

V-D. Affinity Electrophoresis , Generally 

Electrophoretic affinity separation involves electro- 
phoresis of viruses or cells in the presence of target 
30 material, wherein the binding of said target material 
changes the net charge of the virus particles or cells. 
It has been used to separate bacteriophages on the basis 
of charge. (SERW87) . 

Electrophoresis is most appropriate to bacteriophage 
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because of their small size (SERW87) . Electrophoresis is 
a preferred separation means if the target is so small 
that chemically attaching it to a column or to a fluor- 
escent label would essentially change the entire target. 
5 For example, chloroacetate ions contain only seven atoms 
and would be essentially altered by any linkage. GPs that 
bind chloroacetate would become more negatively charged 
than GPs that do not bind the ion and so these classes of 
GPs could be separated. 

10 If affinity electrophoresis is to be used, then: 

1) the target must either be charged or of such a nature 
that its binding to a protein will change the charge 
of the protein, 

2) the target material preferably does not react with 
15 water, 

3) the target material preferably does not bind or 
degrade, proteins in a non-specif ic way, and 

4) the target must be compatible with a suitable gel 
material . 

20 

The present invention makes use of affinity separa- 
tion of bacterial cells, or, bacterial viruses (or other 
genetic packages) to enrich a population for those cells 
or viruses carrying genes that code for proteins with 
25 desirable binding properties. 

V.E. Target Materials 

The present invention may be used to select for 
binding domains which bind to one or more target mater- 
ials, and/or fail to bind to one or more target materials, 
3 0 Specificity, of course, is the ability of a binding 
molecule to bind strongly to a limited set of target 
materials, while binding more weakly or not at all to 
another set of target materials from which the first set 
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must be distinguished. 

The target materials may be organic macromolecules, 
such as polypeptides, lipids, polynucleic acids, and 
polysaccharides, but are not so limited. Almost any 
molecule that is stable in aqueous solvent may be used as 
a target. The following list of possible targets is given 
as illustration and not as limitation. The categories are 
not strictly mutually exclusive. The omission of any 
category is not to be construed to imply that said 
category is unsuitable as a target. 

A. Peptides - - - 

1) human jS endorphin (Merck Index 3528) 

2) dynorphin (MI 3458) 

3) Substance P (MI 8834) 

4) Porcine somatostatin (MI 8671) 

5) human atrial natriuretic factr-r (MI 887) 

6) human calcitonin 

7) glucagon 

B. Proteins 

I. Soluble Proteins 

a . Hormones 

1) human TNF (MI 9411) - - 

2) Interleukin-1 (MI 4895) 

3) Interferon-/ (MI 4894) 

4) Thyrotropin (MI 9709) 

5) Interferon-a (MI 4892) 

6) Insulin (MI 4887, p. 789) 

b. Enzymes 

1) human neutrophil elastase 

2) Human thrombin 

3) human Cathepsin G 

4) human tryptase 

5) human chymase 

6) human blood clotting Factor Xa 
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7) any retro-viral Pol protease 

8) any retro-viral Gag protease 

9) dihydrofolate reductase 

10) Pseudomonas putida cytochrome P450 CAM 

11) human pyruvate kinase 

12) E± coli pyruvate kinase 

13) jack bean urease 

14) aspartate transcarbamylase (!_=_ coli ) 

15) ras protein 

16) any protein-tyrosine kinase 
Inhibitors 

1) aprotinin (MI 784) 

2) human al-anti-trypsin 

3) phage \ cl (inhibits DNA transcription) 
Receptors 

1) TNF receptor 

2) IgE receptor 

3 ) LairtB 

4) CD4 

5) IL-1 receptor 
Toxins 

1) ricin (also an enzyme) 

2) a Conotoxin GI 

3) mellitin 

4) Bordetella pertussis adenylate cyclase 
(also an enzyme) 

5) Pseudomonas aeruginosa hemolysin 
Other proteins 

1) horse heart myoglobin 

2) human sickle-cell haemoglobin 

3) human deoxy haemoglobin 

4) human CO haemoglobin 

5) human low-density lipoprotein (a lipopro- 
tein) 

6) human IgG (combining site removed or 
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blocked) (a glycoprotein) 

7) influenza" haemagglutinin 

8) phage \ caps id 

9) fibrinogen 

10) HIV-1 gpl20 

11) Neisseria gonorrhoeae pilin 

12) fibril or flagellar protein from spirocha- 
ete bacterial species such as those that 
cause syphilis, Lyme disease, or relapsing 
fever 

13) pro-enzymes such as prothrombin and 
trypsinogen 

II* Insoluble Proteins 

1) silk 

2) human elastin 

3) keratin 

4) collagen 

5) fibrin 
Nucleic acids 

a. DNA 

1) ds DNA : 5 1 -ACTAGTCTC-3 1 

3 1 -TGATCAGAG-5 1 

2) ds DNA : 5 1 -CCGTCGAATCCGC-3 1 . 

3 1 -GGCAGTTTAGGCG-5 » 
(Note mismatch) 

3) ss DNA : 5 f -CGTAACCTCGTCATTA-3 1 

(No hair pin) 

4) ss DNA : 5 1 -CCGTAGGT-i 

3 ' -GGCATCCA J 

(Note hair pin) 

5) dsDNA with cohesive ends : 

5 1 -CACGGCTATTACGGT-3 1 
3 1 - CCGATAATGCCA-5 1 

b. RNA 

1) yeast Phe tRNA 

2) ribosomal RNA 

3) segment of mRNA 
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Organic molecules (not" peptide, protein, or nucleic 
acid) 

I. Small and monomer ic 

1) cholesterol 

2) aspartame 

3) bilirubin 

4 ) morphine 

5) codeine 

6) heroine 

7) dichlorodiphenyltrichlorethane (DDT) 

8) prostaglandin PGE2 

9) actinomycin 

10 ) 2,2,3 tr imethyldecane 

11) Buckminsterfullerene 

12) cortavazol (MI 2536, p. 397) 

II. Polymers 

1) cellulose 

2) chitin 
III. Others 

1) O-antigen of Salmonella enteritidis 
1 ipopoly saccharide ) 
Inorganic compounds 

1) asbestos 

2) zeolites 

3 ) hydroxylapatite 

4) 111 face of crystalline silicon 

5) paulingite 

6) U(IV) (uranium ions) 

7) Au(III) (gold ions) 
Organometallic compounds 

1) iron (III) haem 

2) cobalt haem 

3) cobalamine 

4) (isopropylamino) gCr (III) 
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Serine proteases are an especially interesting class 
of potential target materials. Serine proteases are 
ubiquitous in living organisms and play vital roles in 
processes such as: digestion, blood clotting, fibrino- 
5 lysis, immune response, fertilization, and post-transla- 
tional processing of peptide hormones. Although the role 
these enzymes play is vital, uncontrolled or inappropriate 
proteolytic activity can be very damaging. Several serine 
proteases are directly involved in serious disease states. 
10 Uncontrolled neutrophil elastase (NE) (also known as 
leukocyte elastase) is thought to be the major cause of 
emphysema (BEIT86, HUBB86, HUBB89, HUTC87, SOMM90, WEWE87) 

whether caused by congenital lack of a-l-antitrypsin or by 
smoking. NE is also implicated as an essential ingredient 
15 in the pernicious cycle of: 

-> (excess secretion of proteases by neutrophils)-! 

I < 1 

( iiZ lamination) 
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(recruitment of neutrophils) 



observed in cystic fibrosis (CF) (NADE90) . Inappropriate 
NE activity is very harmful and to stop the progression of 
emphysema or to alleviate the symptoms of CF, an inhibitor 

25 of very high affinity is needed. The inhibitor must be 
very specific to NE lest it inhibit other vital serine 
proteases or esterases. Nadel (NADE90) has suggested that 
onset of excess secretion is initiated by 10~ 10 M NE; 
thus, the inhibitor must reduce the concentration of free 

30 NE to well below this level. Thus human neutrophil 
elastase is a preferred target and a highly stable protein 
is a preferred IPBD. In particular, BPTI, ITI-D1, or 
another BPTI homologue is a preferred IPBD for development 
of an inhibitor to HNE. Other preferred IPBDs for making 

35 an inhibitor to HNE include CMTI-III, SLPI, Eglin, a-cono- 
toxin GI, and n Conotoxins. 
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HNE is not the only serine protease for which an 
inhibitor would be valuable. Works concerning uses of 

protease inhibitors and diseases thought to result from 

inappropriate protease activity include: NADE87, REST88, 

5 SOMM90, and SOMM89. Tryptase and chymase may be involved 

in asthma, see FRAN89 and VAND89 . There are reports that 

suggest that Proteinase 3 (also known as p29) - is as 

important or even more important than HNE; see NILE89, 

ARNA90, KAOR88, CAMP90, and GUPT9 0 ♦ Cathepsin G is 

10 another protease that may cause disease when present in 

|| excess; see FERR90, PETE 8 9 , SALV87 , and SOMM90. These 

~K works indicate that a problem exists and that blocking one 

or another protease might well alleviate a disease state. 

SI Some of the cited works report inhibitors having measur- 

7 15 able affinity for a target protease, mt none report 

J? truly excellent inhibitors that have K d in the range of 

fU 10~ 12 M as may be obtained by the method of the present 

C invention. The same IPBDs used for HNE can be used for 

any serine protease. 

20 The present invention is not, however, limited to any 

of the above-identified target materials. The only 
limitation is that the target material be suitable for 
affinity separation. 

A supply of several milligrams of pure target 

25 material is desired. With HNE (as discussed in Examples 
II and III) , 400 /xg of enzyme is used to prepare 200 /il of 
ReactiGel beads. This amount of beads is sufficient for 
as many as 40 fractionations. Impure target material 
could be used, but one might obtain a protein that binds 
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to a contaminant instead of to the target. 

The following information about the target material 
is highly desirable: 1) stability as a function of 
temperature, pH, and ionic strength, 2) stability with 
respect to chaotropes such as urea or guanidiniuia CI, 3) 
pi, 4) molecular weight, 5) requirements for prosthetic 
groups or ions, such as haem or Ca +2 , and 6) proteolytic 
activity, if any. It is also potentially useful to know: 
1) the target's sequence, if the target is a macro- 
molecule, 2) the 3D structure of the target, 3) enzymatic 
activity, if any, and 4) toxicity, if any. 

The user of the present invention specifies certain 
parameters of the intended use of the binding protein: 1) 
the acceptable temperature range, 2) the acceptable pH 
range, 3) the acceptable concentrations of ions and 
neutral solutes, and 4) the maximum acceptable dissocia- 
tion constant for the target and the SBD: 

K T » [Target] [SBD]/ [Target :SBD] . 
In some cases, the user may require discrimination between 
T, the target, and N, some non-target. Let 

K T = [T] [SBD]/ [T: SBD] , and 

% 8=5 [N][SBD]/[N:SBD] , 

then K T /% = ([T] [N:SBD])/([N] [T:SBD]) . 
The user then specifies a maximum acceptable value for the 
ratio K T /%. 

The target material preferably is stable under the 
specified conditions of pH, temperature, and solution 
conditions , 

If the target material is a protease, one considers 
the following points: 

1) a highly specific protease can be treated like any 
other target, 

2) a general protease, such as subtilisin, may degrade 
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the OSPs of the GP including OSP-PBDs; there are 
several alternative Ways of dealing with general 
proteases, including: a) use a protease inhibitor as 
PPBD so that the SBD is an inhibitor of the protease, 
b) a chemical inhibitor may be used to prevent 
proteolysis ( e.g. phenylmethylf luorosulfate (PMFS) 
that inhibits serine proteases) , c) one or more 
active-site residues may be mutated to create an 
inactive protein ( e.g. a serine protease in which the 
active serine is mutated to alanine) , or d) one or 
more active-site amino-acids of the protein may be 
chemically modified to destroy the catalytic activity 
( e.g. a serine protease in which the active serine is 
converted to anhydroserine) , 

3) SBDs selected for binding to a protease need not be 
inhibitors; SBDs that happen to inhibit the protease 
target are a fairly small subset of SBDs that bind to 
the protease target , 

4) the more we modify the target protease, the less like 
we are to obtain an SBD that inhibits the target 
protease, and 

5) if the user requires that the SBD inhibit the target 
protease, then the active site of the target protease 
must not be modified any more than necessary; 
inactivation by mutation or chemical modification are 
preferred methods of inactivation and a protein 
protease inhibitor becomes a prime candidate for 
IPBD. For example, BPTI has been mutated, by the 
methods of the present invention, to bind to pro- 
teases other than trypsin. 

Example III - VI disclose that uninhibited serine 
proteases may be used as targets quite successfully and 
that protein protease inhibitors derived from BPTI and 
selected for binding to these immobilized proteases are 
excellent inhibitors • 
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y.F. immobilization or Labeling of Target M aterial 

For chromatography, FACS, or electrophoresis there 
may be a need to covalently link the target material to a 
second chemical entity. For chromatography the second 
5 entity is a matrix, for FACS the second entity is a 
fluorescent dye, and for electrophoresis the second entity 
is a strongly charged molecule. In many cases, no 
coupling is required because the target material already 
has the desired property of: a) immobility, b) fluores- 

10 cence, or c) charge. In other cases, chemical or physical 
coupling is required. 

Various means may be used to immobilize or label the 
target materials. The means of immobilization or labeling 
is, in part, determined by the nature of the target. In 

15 particular, the physical and chemical nature of the target 
and its functional groups of the target material determine 
which types of immobilization reagents may be most easily 
used. 

For the purpose of selecting an immobilization 
2 0 method, it may be more helpful to classify target materi- 
als as follows: (a) solid, whether crystalline or amor- 
phous, and insoluble in an aqueous solvent (e.g. , many 
minerals, and fibrous organics such as cellulose and 
silk); (b) solid, whether crystalline or amorphous, and 
25 soluble in an aqueous solvent; (c) liquid, but insoluble 
in aqueous phase ( e.g. , 2 , 3 , 3-trimethyldecane) ; or (d) 
liquid, and soluble in aqueous media. 

It is not necessary that the actual target material 
be used in preparing the immobilized or labeled analogue 
30 that is to be used in affinity separation; rather, 
suitable reactive analogues of the target material may be 
more convenient. If 2 , 3 , 3-trimethyldecane were the target 
material, for example, then 2,3,3-trimethyl-lO-aminodecane 
would be far easier to immobilize than the parental 
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compound. Because the latter compound is modified at one 
end of the chain, it retains almost all of the shape and 
charge attributes that differentiate the former compound 
from other alkanes. 

5 Target materials that do not have reactive functional 

groups may be immobilized by first creating a reactive 
functional group through the use of some powerful reagent, 
such as a halogen. For example, an alkane can be immobil- 
ized for affinity by first halogenating it and then 
10 reacting the halogenated derivative with an immobilized or 
immobilizable amine. 

In some cases, the reactive groups of the actual 
ji target material may occupy a part on the target molecule 

0] that is to be left undisturbed. In that case, additional 

S 15 functional groups may be introduced by synthetic chemis- 

^? try. For example, the most reactive groups in cholesterol 

Q are on the steroid ring system, viz , -OH and >C=C. We may 

ttl wish to leave this ring system as it is so that it binds 

% to the novel binding protein. In this case, we prepare an 

2! 20 analogue having a reactive group attached to the aliphatic 

fy chain (such as 26-aminocholesterol) and immobilize this 

5 _ derivative in a manner appropriate to the reactive group 

so attached. 

Two very general methods of immobilization are widely 
25 used. The first is to biotinylate the compound of 
interest and then bind the biotinylated derivative to 
immobilized avidin. The second method is to generate 
antibodies to the target material, immobilize the anti- 
bodies by any of numerous methods, and then bind the 
30 target material to the immobilized antibodies. Use of 
antibodies is more appropriate for larger target materi- 
als; small targets (those comprising, for example, ten or 
fewer non-hydrogen atoms) may be so completely engulfed by 
an antibody that very little of the target is exposed in 
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15 



the target-antibody complex. 

Non-covalent immobilization of hydrophobic molecules 
without resort to antibodies may also be used. A com- 
pound, such as 2,3,3-trimethyldecane is blended with a 
matrix precursor, such as sodium alginate, and the mixture 
is extruded into a hardening solution. The resulting 
beads will have 2,3,3-trimethyldecane dispersed throughout 
and exposed on the surface. 

Other immobilization methods depend on the presence 
of particular chemical functionalities. A polypeptide 
will present -NH 2 (N-terminal; Lysines), -COOH (C-ter- 
minal; Aspartic Acids; Glutamic Acids), -OH (Serines; 
Threonines; Tyrosines), and -SH (Cysteines). A polysac- 
charide has free -OH groups, as does DNA, which has a 
sugar backbone. 

The following table is a nonexhaustive review of 
reactive functional groups and potential immobilization 
x-eagents : 



20 Group 



Reagent 



25 



30 



R-NH 2 



R-NH 2 



Derivatives of 2,4,6-trinitro 
benzene sulfonates (TNBS) , 
(CREI84, p. 11) 



Carboxylic acid anhydrides, 
e.g. derivatives of succinic 
anhydride, maleic anhydride, 
citraconic anhydride (CREI84, 
p. 11) 



35 



R-NH 2 



Aldehydes that form reducible 
Schiff bases (CREI84, p. 12) 



40 



guanido 



R-COoH 



cyclohexanedione derivatives 
(CREI84, p. 14) 



1 



f 
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Diazo cmpds (CREI84, p. 10) 



R-COo- 



Epoxides (CREI84, p. 10) 



10 



15 



R-OH 



Aryl-OH 



Indole ring 



R-SH 



Carboxylic acid anhydrides 



Carboxylic acid anhydrides 



Benzyl halide and sulfenyl 
halides (CREI84, p. 19) 



N-alkylmaleimides (CREI84 , 
P-21) 



20 R-SH 



ethylene imine derivatives 
(CREI84, p. 21) 



25 



30 



35 



R-SH 



R-SH 



Thiol ethers 
Ketones 

Aldehydes 



Aryl mercury compounds, 
(CREI84, P. 21) 

Disulfide reagents, (CREI84, 
p. 23) 



Alkyl iodides, (CREI84, p. 20) 

Make Schiff*s base and reduce 
with NaBH 4 . (CREI84, p. 12-13) 

Oxidize to COOH, vide supra » 



40 



45 



R-S0 3 H 



R-PO3H 



Convert to R-S0 2 C1 and react 
with immobilized alcohol or 



amine . 



Convert to R-P0 2 C1 and react 
with immobilized alcohol or 



amine. 



CC double bonds 



50 



Add HBr and then make amine 



or 



thiol. 
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The next table identifies the reactive groups of a number 
of potential targets. 



Compound fltem#, page) 1 



Reactive groups or 
[derivatives 1 



prostaglandin E2 (2893,1251) 
aspartame (861,132) 
haem (4558, 732) 
bilirubin (1235,189) 
20 morphine (6186,988) 



codeine (2459,384) 



-OH, keto, -C00H, C=C 

-NH 2 , -C00H, -COOCH3 

vinyl, -COOH, Fe - 

vinyl, -COOH, keto, -NH- 

-0H, -C=C-, reactive phenyl 
ring 

-OH, -C=C-, reactive phenyl 
ring 



dichlorodiphenyltrichlorethane (2832,446) 

aromatic chlorine, aliphatic 
chlorine 



benzo(a)pyrene (1113,172) 



actinomycin D (2804,441) 
cellulose 
hydroxylapatite 
45 cholesterol (2204,341) 



[Chlorinate->amine, or make 
sulfonates Aryl-S0 2 Cl] 



aryl-NH 2 , -OH 
self immobilized 
self immobilized 
-OH, >C=C- 



*Note: Item# and page refer to The Merck Index, 11th 
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Edition. 

The extensive literature on affinity chromatography 
and related techniques will provide further examples. 

Matrices suitable for use as support materials 
5 include polystyrene, glass, agarose and other chromato- 
graphic supports, and may be fabricated into beads, 
sheets, columns, wells, and other forms as desired ♦ 
Suppliers of support material for affinity chromatography 
include: Applied Protein Technologies Cambridge, MA; Bio- 
10 Rad Laboratories, Rockville Center, NY; Pierce Chemical 
Company, Rockford, IL. Target materials are attached to 
the matrix in accord with the directions of the manufac- 
turer of each matrix preparation with consideration of 
good presentation of the target. 

15 Early in the selection process, relatively high 

concentrations of target materials may be applied to the 
matrix to facilitate binding; target concentrations may 
subsequently be reduced to select for higher affinity 
SBDs. 

20 V,G, Elution of Lower Affinity PBD- Bearina Genetic 
Packages - - 

The population of GPs is applied to an affinity 
matrix under conditions compatible with the intended use 
of the binding protein and the population is fractionated 

25 by passage of a gradient of some solute over the column » 
The process enriches for PBDs having affinity for the 
target and for which the affinity for the target is least 
affected by the eluants used. The enriched fractions are 
those containing viable GPs that elute from the column at 

30 greater concentration of the eluant. 

The eluants preferably are capable of weakening 
noncovalent interactions between the displayed PBDs and 
the immobilized target material. Preferably, the eluants 
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do not kill the genetic package; the genetic message 
corresponding to successful mini-proteins is most conven- 
iently amplified by reproducing the genetic package rather 
than by in vitro procedures such as PCR. The list of 
5 potential eluants includes salts (including Na+, NH 4 +, 
Rb+, S0 4 — , H 2 P0 4 -, citrate, K+, Li+, Cs+, HSO4-, CO3 — , 

Ca++, Sr++, C1-, P0 4 , HCO3-, Mg++, Ba++, Br-, HP0 4 — 

and acetate) , acid, heat, compounds known to bind the 
target, and soluble target material (or analogues there- 
10 of) . 

Because bacteria continue to metabolize during 
affinity separation, the choice of buffer components is 
P more restricted for bacteria than for bacteriophage or 

jg spores. Neutral solutes, such as ethanol, acetone, ether, 

S 15 or urea, are frequently used in protein purification and 

Ul are known to weaken non-covalent interactions between 

^ proteins and other molecules. Iiany of t>?ese species are, 

however, very harmful to bacteria and bacteriophage* Urea 
s is known not to harm M13 up to 8 M. Bacterial spores, on 

Q 20 the other hand, are impervious to most neutral solutes. 

Several affinity separation passes may be made within a 
l§ single round of variegation* Different solutes may be 

O used in different analyses, salt in one, pH in the next, 

r ^ etc * 

25 Any ions or cof actors needed for stability of PBDs 

(derived from IPBD) or target are included in initial and 
elution buffers at appropriate levels. We first remove 
GP(PBD)s that do not bind the target by washing the 
matrix with the initial buffer. We determine that this 

30 phase of washing is complete by plating aliquots of the 
washes or by measuring the optical density (at 260 nm or 
280 nm) . The matrix is then eluted with a gradient of 
increasing: a) salt, b) [H+] (decreasing pH) , c) neutral 
solutes, d) temperature (increasing or decreasing), or e) 
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some combination of these factors. The solutes in each of 
the first three gradients have been found generally to 
weaken non-covalent interactions between proteins and 
bound molecules . Salt is a preferred solute for gradient 
5 formation in most cases. Decreasing pH is also a highly 
preferred eluant. In some cases, the preferred matrix is 
not stable to low pH so that salt and urea are the most 
preferred reagents. Other solutes that generally weaken 
non-covalent interaction between proteins and the target 
10 material of interest may also be used. 

The uneluted genetic packages contain DNA encoding 
binding domains which have a sufficiently high affinity 
for the target material to resist the elution conditions. 
The DNA encoding such successful binding domains may be 

15 recovered in a variety of ways. Preferably, the bound 
genetic packages are simply eluted by means of a change in 
the elution conditions. Alternatively, one may culture 
the genetic package in situ , or extract the target - 
containing matrix with phenol (or other suitable solvent) 

20 and amplify the DNA by PCR or by recombinant DNA techni- 
ques. Additionally, if a site for a specific protease has 
been engineered into the display vector, the specific 
protease is used to cleave the binding domain from the GP. 

V.H. Optimization of Affinity Chromatography Separation: 

25 For linear gradients, elution volume and eluant 

concentration are directly related. Changes in eluant 
concentration cause GPs to elute from the column. Elution 
volume, however, is more easily measured and specified. 
It is to be understood that the eluant concentration is 

30 the agent causing GP release and that an eluant concentra- 
tion can be calculated from an elution volume and the 
specified gradient. 

Using a specified elution regime, we compare the 
elution volumes of GP(IPBD)s with the elution volumes of 
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wtGP on affinity columns supporting AfM(IPBD) . Com- 
parisons are made at various: a) amounts of IPBD/GP, b) 
densities of AfM(IPBD)/ (volume of matrix) (DoAMoM) , c) 
initial ionic strengths, d) elution rates, e) amounts of 
5 GP/ (volume of support), f) pHs, and g) temperatures, 
because these are the parameters most likely to affect the 
sensitivity and efficiency of the separation . We then 
pick those conditions giving the best separation. 

We do not optimize pH or temperature; rather we 
10 record optimal values for the other parameters for one or 
more values of pH and temperature. The pH used must be 
within the range of pH for which GP(IPBD) binds the 
Q AfM(IPBD) that is being used in this step. The conditions 

J? of intended use specified by the user may include a 

Ji 15 specification of pH or temperature. If pH is specified, 

yj then pH will not be varied in eluting the column. 

"*M Decreasing pH may, however, be used to liberate isound GPs 

flJ from the matrix. Similarly, if the intended use specifies 

a a temperature, we will hold the affinity column at the 

Q 2 0 specified temperature during elution, but we might vary 

«: the temperature during recovery. If the intended use 

k a specifies the pH or temperature, then we prefer that the 

O affinity separation be optimized for all other parameters 

^ at the specified pH and temperature. 

25 In the optimization devised in this step, we prefer- 

ably use a molecule known to have moderate affinity for 
the IPBD (K d in the range 10~ 6 M to 10" 8 M) , for the 
following reason. When populations of GP(vgPBD)s are 
fractionated, there will be roughly three subpopulations: 

30 a) those with no binding, b) those that have some binding 
but can be washed off with high salt or low pH, and c) 
those that bind very tightly and are most easily rescued 
in situ . We optimize the parameters to separate (a) from 
(b) rather than (b) from (c) . ' Let PBD W be a PBD having 
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weak binding to the target and PBD S be a PBD having strong 
binding. Higher DoAMoM might, for example, favor reten- 
tion of GPfPBE^) but also make it very difficult to elute 
viable GP(PBD S ) . We will optimize the affinity separation 
5 to retain GPtPBE^) rather than to allow release of 
GP(PBD S ) because a tightly bound GP(PBD S ) can be rescued 
by in situ growth. If we find that DoAMoM strongly 
affects the elution volume, then in part III we may reduce 
the amount of target on the affinity column when * an SBD 
10 has been found with moderately strong affinity on the 

order of 10" 7 M) for the target. 

In case the promoter of the osp-ipbd gene is not 
regulated by a chemical inducer, we optimize DoAMoM, the 
elution rate, and the amount of GP/volume of matrix. If 
15 the optimized affinity separation is acceptable, we 
proceed . If not, we develop a means to alter the amount 
of IPBD per GP. 'uriong GPs considered in the present 
invention, this case could arise only for spores because 
regulatable promoters are available for all other systems. 

20 If the amount of IPBD/spore is too high, we could 

engineer an operator site into the osp-ipbd gene. We 
choose the operator sequence such that a repressor 
sensitive to a small diffusible inducer recognizes the 
operator. Alternatively, we could alter the Shine- 

25 Dalgarno sequence to produce a lower homology with 
consensus Shine-Dalgarno sequences. If the amount of 
IPBD/spore is too low, we can introduce variability into 
the promoter or Shine-Dalgarno sequences and screen 
colonies for higher amounts of IPBD/spore. 

30 In this step, we measure elution volumes of geneti- 

cally pure GPs that elute from the affinity matrix as 
sharp bands that can be detected by UV absorption. 
Alternatively, samples from effluent fractions can be 
plated on suitable medium (cells or spores) or on sensi- 
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tive cells (phage) and colonies or plaques counted. 

Several values of IPBD/GP, DoAMoM, elution rates, 
initial ionic strengths, and loadings should be examined. 
The following is only one of many ways in which the 
5 affinity separation could be optimized. We anticipate 
that optimal values of IPBD/GP and DoAMoM will be corre- 
lated and therefore should be optimized together. The 
effects of initial ionic strength, elution rate, and 
amount of GP/ (matrix volume) are unlikely to be strongly 
10 correlated, and so they can be optimized independently. 

For each set of parameters to be tested, the column 
is eluted in a specified manner. For example, we may use 
a regime called Elution Regime 1: a KC1 gradient runs from 
lOmM to maximum allowed for the GP(IPBD) viability in 100 

15 fractions of 0.05 V v , followed by 20 fractions of 0.05 V v 
at maximum allowed KC1; pH of the buffer i* maintained at 
the' specified value with a convenient: buffer such as 
phosphate, Tris, or MOPS. Other elution regimes can be 
used; what is important is that the conditions of this 

20 optimization be similar to the conditions that are used in 
Part III for selection for binding to target and recovery 
of GPs from the chromatographic system. 

When the osp-iobd gene is regulated by [XINDUCE], 
IPBD/GP can be controlled by varying [XINDUCE] . Appro- 

25 priate values of [XINDUCE] depend on the identity of 
[XINDUCE] and the promoter; if, for example, XINDUCE is 
isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS, then [IPTG] = 0, 0.1 uM, 1.0 uM, 10.0 uM, 100.0 
uM, and 1.0 mM would be appropriate levels to test. The 

30 range of variation of [XINDUCE] is extended until an 
optimum is found or an acceptable level of expression is 
obtained, 

DoAMoM is varied from the maximum that the matrix 
material can bind to 1% or 0.1% of this level in appro- 



180 

priate steps. We anticipate that the efficiency of 
separation will be a smooth function of DoAMoM so that it 
is appropriate to cover a wide range of values for DoAMoM 
with a coarse grid and then explore the neighborhood of 
5 the approximate optimum with a finer grid. 

Several values of initial ionic strength are tested, 
such as 1.0 mM, 5.0 mM, 10.0 mM and 20.0 mM. Low ionic 
strength favors binding between oppositely charged groups, 
but could also cause GP to precipitate. 
10 The elution rate is varied, by successive factors of 

1/2, from the maximum attainable rate to 1/16 of this 
value. If the lowest elution rate tested gives the best 
separation, we test lower elution rates until we find an 
optimum or adequate separation. 

15 The goal of the optimization is to obtain a sharp 

transition between bound and unbound GPs, triggered by 
increasing salt or decreasing pH or a combination of b^Lh. 
This optimization need be performed only: a) for each 
temperature to be used, b) for each pH to be used, and c) 

20 when a new GP(IPBD) is created. 

V.I. Measuring the sensitivity of affinit y separation: 

Once the values of IPBD/GP, DoAMoM, initial ionic 
strength, elution rate, and amount of GP/ (volume of 
affinity support) have been optimized, we determine the 

25 sensitivity of the -affinity separation (C sens i) by the 
following procedure that measures the minimum quantity of 
GP(IPBD) that can be detected in the presence of a large 
excess of wtGP. The user chooses a number of separation 
cycles, denoted N chrom , that will be performed before an 

30 enrichment is abandoned; preferably, N chrom is in the 
range 6 to 10 and N chrom must be greater than 4. Enrich- 
ment can be terminated by isolation of a desired GP(SBD) 
before N c krom passes. 
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The measurement of sensitivity is significantly 
expedited if GP(IPBD) and wtGP carry different selectable 
markers because such markers allow easy identification of 
colonies obtained by plating fractions obtained from the 
5 chromatography column. For example, if wtGP carries 
kanamycin resistance and GP(IPBD) carries ampicillin 
resistance, we can plate fractions from a column on non- 
selective media suitable for the GP. Transfer of colonies 
onto ampicillin- or kanamycin-containing media will 

10 determine the identity of each colony. 

Mixtures of GP(IPBD) and wtGP are prepared in the 
ratios of l:V lim , where V liin ranges by an appropriate 
factor ( e.g. 1/10) over an appropriate range, typically 
10 11 through 10 4 . Large values of Vn m are tested first; 

15 once a positive result is obtained for one value of Vn n , 
no smaller values of Vi± m need be tested. Each mixture is 
applied -Lo a column supporting, at the optimal DoAMoM, an 
AfM(IPBD) having high affinity for 1PBD and the column is 
eluted by the specified elution regime, such as Elution 

20 Regime 1. The last fraction that contains viable GPs and 
an inoculum of the column matrix material are cultured. 
If GP(IPBD) and wtGP have different selectable markers, 
then transfer onto selection plates identifies each 
colony. If GP(IPBD) and wtGP have no selectable markers 

2 5 or the same selectable markers, then a number (e.g., 32) of 
GP clonal isolates are tested for presence of IPBD. If 
IPBD is not detected on the surface of any of the isolated 
GPs, then GPs are pooled from: a) the last few (e.g.. 3 to 
5) fractions that contain viable GPs, and b) an inoculum 

30 taken from the column matrix. The pooled GPs are cultured 
and passed over the same column and enriched for GP(IPBD) 
in the manner described. This process is repeated until 
N chrom Passes have been performed, or until the IPBD has 
been detected on the GPs. If GP(IPBD) is not detected 

35 after N chrom passes, V lim is decreased and the process is 
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repeated • 

Once a value for Vn m is found that allows recovery 
of GP(IPBD)s, the factor by which Vi± m is varied is 
reduced and additional values are tested until Vii m is 
5 known to within a factor of two. 

c sensi equals the highest value of for which 

the user can recover GP(IPBD) within N chrom passes. The 
number of chromatographic cycles (K cyc ) that were- needed 
to isolate GP(IPBD) gives a rough estimate of C ef f; C e ff 
10 is approximately the K cyc th root of VI im: 

c eff ~ ex Pf lo ge( v lim>/ K cyc > 

For example, if V liltl were 4.0 x 10 s and three 
separation cycles were needed to isolate GP(IPBD), then 
Ceff * 736. 

15 V.J. Measuring the effic iency of separation : 

To determine C eff more accurately, we determine the 
ratio of GP(lPBD)/wtGP loaded onto an AfM(IPBD) column 
that yields approximately equal amounts of GP(IPBD) and 
wtGP after elution. We prepare mixtures of GP(IPBD) and 

20 wtGP in ratios GP(IPBD) :wtGP :: 1:Q? we start Q at twenty 
times the approximate C eff found above. A 1:Q mixture of 
GP(IPBD) and wtGP is applied to a AfM(IPBD) column and 
eluted by the specified elution regime, such as Elution 
Regime 1. A sample of the last fraction that contains 

25 viable GPs is plated at a dilution that gives well 
separated colonies or plaques. The presence of IPBD or 
the osp-ipbd gene in each colony or plaque can be deter- 
mined by a number of standard methods, including: a) use 
of different selectable markers, b) nitrocellulose filter 

30 lift of GPs and detection with AfM(IPBD)* (AUSU87) , or c) 
nitrocellulose filter lift of GPs and detection with 
radiolabeled DNA that is complementary to the osp-ipbd 
gene (AUSU87) * Let F be the fraction of GP(IPBD) colonies 
found in. the last fraction containing viable GPs. When a 
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Q is found such that .20 < F < .80, then 

C e ff = Q * F * 

If F < 0.2, then we reduce Q by an appropriate 
factor ( e.g. 1/10) and repeat the procedure. If F > 0.8, 
5 then we increase Q by an appropriate factor ( e.g. 2) and 
repeat the procedure. 

V.K. Reducing selection due to non-specific binding: 

When affinity chromatography is used for separating 
bound and unbound GPs, we may reduce non-specific binding 
10 of GP(PBD)s to the matrix that bears the target in the 
following ways: 

1) we treat the column with blocking agents such as 
genetically defective GPs or a solution of protein 
before the population of GP(vgPBD)s is chromato- 

15 graphed, and 

2) we pass the population of GP(vgPBD)s over a matrix 
containing no target or a different target from the 
same class as the actual target prior to affinity 
chromatography . 

20 Step (1) above saturates any non-specific binding that the 
affinity matrix might show toward wild-type GPs or 
proteins in general; step (2) removes components of our 
population that exhibit non-specific binding to the matrix 
or to molecules of the same class as the target. If the 

25 target were horse heart myoglobin, for example, a column 
supporting bovine serum albumin could be used to trap GPs 
exhibiting PBDs with strong non-specific binding to 
proteins. If cholesterol were the target, then a hydro- 
phobic compound, such as p-tertiarybutylbenzyl alcohol, 

30 could be used to remove GPs displaying PBDs having strong 
non-specific binding to hydrophobic compounds. It is 
anticipated that PBDs that fail to fold or that are 
prematurely terminated will be non-specif ically sticky. 
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These sequences could outnumber the PBDs having desirable 
binding properties. Thus, the capacity of the initial 
column that removes indiscriminately adhesive PBDs should 
be greater f e.q. 5 fold greater) than the column that 
5 supports the target molecule. 

Variation in the support material (polystyrene, 
glass, agarose, cellulose, etc, ) in analysis of clones 
carrying SBDs is used to eliminate enrichment for packages 
that bind to the support material rather than the target, 

10 FACs may be used to separate GPs that bind fluores- 

cent labeled target. We discriminate against artif actual 
binding to the fluorescent label by using two or more 
different dyes, chosen to be structurally different. GPs 
isolated using target labeled with a first dye are 

15 cultured. These GPs are then tested with target labeled 
with a second dye. 

Electrophoretic affinity separation uses unentered 
target so that only other ions in the buffer can give rise 
to artif actual binding. Artif actual binding to the gel 
20 material gives rise to retardation independent of field 
direction and so is easily eliminated. 

A variegated population of GPs will have a variety 
of charges. The following 2D electrophoretic procedure 
accommodates this variation in the population. First the 

25 variegated population of GPs is electrophoresed in a gel 
that contains no target material. The electrophoresis 
continues until the GP s are distributed along the length 
of the lane. The gels described by Sewer for phage are 
very low in agarose and lack mechanical stability. The 

30 target-free lane in which the initial electrophoresis is 
conducted is separate from a square of gel that contains 
target material by a removable baffle. After the first 
pass, the baffle is removed and a second electrophoresis 
is conducted at right angles to the first. GPs that do 
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not bind target migrate with unaltered mobility while GP s 
that do bind target will' separate from the majority that 
do not bind target. A diagonal line of non-binding GPs 
will form. This line is excised and discarded. Other 
parts of the gel are dissolved and the GPs cultured. 

V.L. Isolation of GPfPBD^s with bindin cr-to-target pheno- 
types : 



The harvested packages are now enriched for the 
binding-to-target phenotype by use of affinity separation 

10- involving the target material immobilized on an affinity 
matrix.- Packages that fail to bind to the target material 
are washed away. If the packages are bacteriophage or 
endospores, it may be desirable to include a bacteriocidal 
agent, such as azide, in the buffer to prevent bacterial 

15 growth. The buffers used in chromatography include: a) 
any ions or other solutes needed to stabilize the target, 
and b) any ions or other colutes needed to stabilise the 
PBDs derived from the IPBD. 

V.M. Recovery of packages: 

20 Recovery of packages that display binding to an 

affinity column _may be achieved in several ways, includ- 
ing: 

1) collect fractions eluted from the column with a 
gradient as described above; fractions eluting later 

25 in the gradient contain GPs more enriched for genes 

encoding PBDs with high affinity for the column, 

2) elute the column with the target material in soluble 
form, 

3) flood the matrix with a nutritive medium and grow 
30 the desired packages in situ , 

4) remove parts of the matrix and use them to inoculate 
growth medium, 

5) chemically or enzymatically degrade the linkage 
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holding the target to the matrix so that GPs still 
bound to target are eluted, or 
6) degrade the packages and recover DNA with phenol or 
other suitable solvent; the recovered DNA is used to 
5 transform cells that regenerate GPs* 

It is possible to utilize combinations of these methods. 
It should be remembered that what .we want to recover from 
the affinity matrix is not the GPs per se/ but the 
information in them. Recovery of viable GPs is very 

10 strongly preferred, but recovery of genetic material is 
essential. If cells, spores, or virions bind irrever- 
sibly to the "matrix but are not killed, we can recover the 
information through in situ cell division, germination, or 
infection respectively. Proteolytic degradation of the 

15 packages and recovery of DNA is not preferred. 

Although degradation of the bound GPs and recovery 
of genetic material is a possible mode of operation, 
inadvertent inactivation of the GPs is very deleterious. 
It is preferred that maximum limits for solutes thar do 

20 not inactivate the GPs or denature the target or the 
column are determined. If the affinity matrices are 
expendable ,_ . one may. use conditions that denature the 
column to elute GPs; before the target is denatured, a 
portion of the affinity matrix should be removed for 

25 possible use as an inoculum. As the GPs are held together 
by protein-protein interactions and other non-covalent 
molecular interactions, there will be cases in which the 
molecular package will bind so tightly to the target 
molecules on the affinity matrix that the GPs can not be 

30 washed off in viable form. This will only occur when very 
tight binding has been obtained. In these cases, methods 
(3) through (5) above can be used to obtain the bound 
packages or the genetic messages from the affinity matrix. 

It is possible, by manipulation of the elution 
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conditions, to isolate SBDs that bind to the target at one 
PH (pH b ) but not at another pH (pH G ) . The population is 
applied at pH b and the column is washed thoroughly at pH b . 
The column is then eluted with buffer at pH Q and GPs that 
5 come off at the new pH are collected and cultured. 
Similar procedures may be used for other solution para- 
meters, such as temperature. For example, GP(vgPBD)s 
could be applied to a column supporting insulin* After 
eluting with salt to remove GPs with little or no binding 
10 to insulin, we elute with salt and glucose to liberate GPs 
that display PBDs that bind insulin or glucose in a 
competitive manner. - ~ 

V.N. Amplifying the Enriched Packages 

Viable GPs having the selected binding trait are 
15 amplified by culture in a suitable medium, or, in the case 
of phage, infection into a host so cultivated. If the GPs 
have been inactivated by the chromatography, the ocv 
carrying the osp -pbd gene are recovered from the GP, and 
introduced into a new, viable host. 

20 V.O. Determining whether further enrich ment is needed: 

The probability of isolating a GP with improved 
binding increases by C e ff with each separation cycle. Let 
N be the number of distinct amino-acid sequences produced 
by the variegation. We want to perform K separation 
25 cycles before attempting to isolate an SBD, where K is 
such that the probability of isolating a single SBD is 
0.10 or higher. 

K = the smallest integers log 10 (0.10 N)/log 10 (C ef f ) 

For example, if N were 1.0*10 7 and C e ff = 6.31- 10 2 , 
30 then log 10 (l.0-10 6 )/log 10 (6.3l-10 2 ) - 6.0000/2.8000 - 
2.14. Therefore we would attempt to . isolate SBDs after 
the third separation cycle. After only two separation 
cycles, the probability of finding an SBD is 
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(6.31 X 10 2 ) 2 /(1*0X 10 7 ) « -04 

and attempting to isolate SBDs might be profitable* 

Clonal isolates from the last fraction eluted which 
contained any viable GPs, as well as clonal isolates 
5 obtained by culturing an inoculum taken from the affinity 
matrix, are cultured in a growth step that is similar to 
that described previously . Other fractions may be 
cultured too* If K separation cycles have been completed, 
samples from a number, e.g. 32, of these clonal isolates 
10 are tested for elution properties on the {target} column. 
If none of the isolated, - genetically pure GPs show 
0 improved binding to target, or if K cycles have not yet 

^ been completed, then we pool and culture, in a manner 

5! similar to the manner set forth previously, the GPs from 

ihj 15 the last few fractions eluted that contained viable GPs 

SO and from the GPs obtained by culturing an inoculum taken 

^ from the column matrix . We then repeat the enrichment 

procedure described above. Thife cyclic enrichment may 
p continue N^j-oju passes or until an SBD is isolated. 

il 20 If one or more of the isolated GPs has improved 

0 retention on the {target} column, we determine whether the 

retention of the candidate" SBDs is due- to affinity for the 
target material as follows* A second column is prepared 
using a different support matrix with the target material 
25 bound at the optimal density. The elution volumes, under 
the same elution conditions as used previously, of 
candidate GP(SBD)s are compared to each other and to 
GP(PPBD of this round)* If one or more candidate GP(SBD)s 
has a larger elution volume than GP(PPBD of this round), 
30 then we pick the GP (SBD) having the highest elution volume 
and proceed to characterize the population. If none of 
the candidate GP(SBD)s has higher elution volume than 
GP(PPBD of this round), then we pool and culture, in a 
manner similar to the manner used previously, the GPs from 
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the last few fractions that contained viable GPs and the 
GPs obtained by culturing an inoculum taken from the 
column matrix* We then repeat the enrichment procedure. 

If all of the SBDs show binding that is superior to 
PPBD of this round, we pool and culture the GPs from the 
last fraction that contains viable GPs and from the 
inoculum taken from the column. This population is re- 
chromatographed at least one pass to fractionate further 
the GPs based on K^. 

If an RNA phage were used _ as . GP, the RNA would 
either be cultured with the assistance of _a helper phage 
or be reverse transcribed and the DNA amplified. The 
amplified DNA could then be sequenced or subcloned into 
suitable plasmids. 

V.P. Characterizing the Putative SBDs: 

We characterize members -of the population showing 
desired binding properties genetic and biochemical 

methods. We obtain clonal isolates and test these strains 
by genetic and affinity methods to determine genotype and 
phenotype with respect to binding to target. For several 
genetically pure isolates that show binding, we demonstr- 
ate that the binding is caused by the artificial chimeric 
gene by excising the osp-sbd gene and crossing it into the 
parental GP. We also ligate the deleted backbone of each 
GP from which the osp-sbd is removed and demonstrate that 
each backbone alone cannot confer binding to the target on 
the GP. We sequence the osp-sbd gene from several clonal 
isolates. Primers for sequencing are chosen from the DNA 
flanking the osp-ppbd gene or from parts of the osp-ppbd 
gene that are not variegated* 

The present invention is not limited to a single 
method of determining protein sequences, and reference in 
the appended claims to determining the amino acid sequence 
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of a domain is intended to include any practical method or 
combination of methods, whether direct or indirect* The 
preferred method, in most cases, is to determine the 
sequence of the DNA that encodes the protein and then to 
5 infer the amino acid sequence. In some cases, standard 
methods of protein-sequence determination may be needed to 
detect post-translational processing. 

The present invention is not limited to a single 
method of determining the sequence of nucleotides (nts) in 

10 DNA subsequences. In the preferred embodiment, plasmids 
are isolated and denatured in the presence of a sequencing 
primer, about 20 nts long, that anneals to a region 
adjacent, on the 5 1 side, to the region of interest. This 
plasmid is then used as the template in the four sequenc- 

15 ing reactions with one dideoxy substrate in each. 
Sequencing reactions, agarose gel electrophoresis, and 
polyacrylamide gel electrophoresis (PAC^) are performed by 
. standard procedures (AUSU87) . 

For one or more clonal isolates, we may subclone the 
20 sbd gene fragment, without the osp fragment, into an 
expression vector such that each SBD can be produced as a 
free protein. Because numerous unique restriction sites 
were built into the inserted domain, it is easy to 
subclone the gene at any time. Each SBD protein is 
25 purified by normal means, including affinity chromato- 
graphy. Physical measurements of the strength of binding 
are then made on each free SBD protein by one of the 
following methods: 1) alteration of the Stokes radius as a 
function of binding of the target material, measured by 
30 characteristics of elution from a molecular sizing column 
such as agarose, 2) retention of radiolabeled binding 
protein on a spun affinity column to which has been 
affixed the target material, or 3) retention of radio- 
labeled target material on a spun affinity column to which 
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has been affixed the binding protein. The measurements of 
binding for each free SBD are compared to the correspond- 
ing measurements of binding for the PPBD. 

In each assay, we measure the extent of binding as a 
5 function of concentration of each protein, and other 
relevant physical and chemical parameters such as salt 
concentration, temperature, pH, and prosthetic group 
concentrations (if any) • 

In addition, the SBD with highest affinity for the 
10 target from each round is compared to the best SBD of the 
previous round (IPBD for the first round) and to the IPBD 
P , (second and later rounds) with respect to affinity for the 

;|l target material. Successive rounds of mutagenesis and 

05 selection-through-binding yield increasing affinity until 

r\ 15 desired levels are achieved. 

5? If we find that the binding is not yet sufficient, 

we decide which residues to w ary next. If the binding is 
™" sufficient, then we now have a expression vector bearing a 

O gene encoding the desired novel binding protein. 

cn 

r§* 20 V,Q. Joint selections: 

^ ^ 

One may modify the affinity separation of the method 
described to select a molecule that binds to material" A 
but not to material B. One needs to prepare two selection 
columns, one with material A and the other with material 

25 B* The population of genetic packages is prepared in the 
manner described, but before applying the population to A, 
one passes the population over the B column so as to 
remove those members of the population that have high 
affinity for B ("reverse affinity chromatography") . In 

30 the preceding specification, the initial column supported 
some other molecule simply to remove GP(PBD)s that 
displayed PBDs having indiscriminate affinity for sur- 
faces. 
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It may be necessary to amplify the population that 
does not bind to B before passing it over A* Amplifi- 
cation would most likely be needed if A and B were in some 
ways similar and the PPBD has been selected for having 
affinity for A. The optimum order of interactions might 
be determined empirically ♦ For example, to obtain an SBD 
that binds A but not B, three columns could be connected 
in series: a) a column supporting some compound, neither A 
nor B, or only the matrix material , b) a column supporting 
B, and c) a column supporting A* A population of GP(vg- 
PBD)s is applied to the series of columns and the columns 
are washed with the buffer of constant ionic strength that 
is used in the application. The columns are uncoupled, 
and the third column is eluted with a gradient to isolate 
GP(PBD)s that bind A but not B. 

One can also generate molecules that bind to both A 
and B. In this case we can use a 3D model and mutate one 
face of the molecule in question to get binding to A. 
One can then mutate a different face to produce binding to 
B. When an SBD binds at least somewhat to both A and B, 
one can mutate the chain by Diffuse Mutagenesis to refine 
the binding and use a sequential joint selection for 
binding to both A and B. - - - , 

The materials A and B could be proteins that differ 
at only one or a few residues. For example, A could be a 
natural protein for which the gene has been cloned and B 
could be a mutant of A that retains the overall 3D 
structure of A. SBDs selected to bind A but not B 
probably bind to A near the residues that are mutated in 
B. If the mutations were picked to be in the active site 
of A (assuming A has an active site) , then an SBD that 
binds A but not B will bind to the active site of A and is 
likely to be an inhibitor of A. 

To obtain a protein that will bind to both A and B, 
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we can, alternatively, first obtain an SBD that binds A 
and a different SBD that binds B. We can then combine the 
genes encoding these domains so that a two-domain single- 
polypeptide protein is produced . The fusion protein will 
have affinity for both A and B because one of its domains 
binds A and the other binds B. 

One can also generate binding proteins with affinity 
for both A and B, such that these materials will compete 
for the same site on the binding protein. We guarantee 
competition by overlapping the sites for A and B. Using 
the procedures of the present invention, we first create a 
molecule that binds to target material A. We then vary a 
set of residues defined as: a) those residues that were 
varied to obtain binding to A, plus b) those residues 
close in 3D space to the residues of set (a) but that are 
internal and so are unlikely to bind directly to either A 
or B. Residues in set (b) are likely to make small 
changes in the positioning of the residues *n set (a) such 
that the affinities for A and B will be changed by small 
amounts. Members of these populations are selected for 
affinity to both A and B. 

V.R. Selection for non-binding: 

The method of the present invention can be used to 
select proteins that do not bind to selected targets. 
Consider a protein of pharmacological importance, such as 
streptokinase, that is antigenic to an undesirable extent. 
We can take the pharmacologically important protein as 
IPBD and antibodies against it as target. Residues on the 
surface of the pharmacologically important protein would 
be variegated and GP(PBD)s that do not bind to an antibody 
column would be collected and cultured. Surface residues 
may be identified in several ways, including: a) from a 3D 
structure, b) from hydrophobicity considerations, or c) 
chemical labeling. The 3D structure of the pharmacologi- 
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cally important protein remains the preferred guide to 
picking residues to vary, except now we pick residues that 
are widely spaced so that we leave as little as possible 
of the original surface unaltered* 

5 Destroying binding frequently requires only that a 

single amino acid in the binding interface be changed. If 
polyclonal antibodies are used, we face the problem that 
all or most of the strong epitopes must be altered in a 
single molecule. Preferably, one would have a set of 

10 monoclonal antibodies, or a narrow range of antibody 
species. If we had a series of monoclonal antibody 
columns, we could obtain one or more mutations that 
abolish binding to each monoclonal antibody. We could 
then combine some or all of these mutations in one 

15 molecule to produce a pharmacologically important protein 
recognized by none of the monoclonal antibodies. Such 
mutants are tested to verify that the pharmacologically 
interesting properties have not be altered to an unaccep- 
table degree by the mutations. 

20 Typically, polyclonal antibodies display a range of 

binding constants for antigen. Even if we have only 
polyclonal antibodies that bind to the pharmacologically 
important protein, we may proceed as follows. We engineer 
the pharmacologically important protein to appear on the 

25 surface of a replicable GP. We introduce mutations into 
residues that are on the surface of the pharmacologically 
important protein or into residues thought to be on the 
surface of the pharmacologically important protein so that 
a population of GPs is obtained. Polyclonal antibodies 

3 0 are attached to a column and the population of GPs is 
applied to the column at low salt. The column is eluted 
with a salt gradient. The GPs that elute at the lowest 
concentration of salt are those which bear pharmaco- 
logically important proteins that have been mutated in a 
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way that^ eliminates binding to the antibodies having 
maximum affinity for the pharmacologically important 
protein. The GPs eluting at the lowest salt are isolated 
and cultured. The isolated SBD becomes the PPBD to 
5 further rounds of variegation so that the antigenic 
determinants are successively eliminated. 
V.S. Selection of PBDs for retention of structure; 

Let us take an SBD with known affinity for a target 
as PPBD to a variegation of a region of the PBD that is 

10 far from the residues that were varied to create the SBD, 
We can use the target as an affinity molecule to select 
the PBDs that retain binding for the target, and that 
presumably retain the underlying structure of the IPBD. 
The variegations in this case could include insertions and 

15 deletions that are likely to disrupt the IPBD structure. 
We could also use the IPBD and AfM(IPBD) in the same way. 

For example, if IPBD were BPTI and AfM(BPTI) were 
trypsin, we could introduce four or five additional 
residue after residue 26 and select GPs that display PBDs 
20 having specific affinity for AfM(BPTI) . Residue 26 is 
chosen because it is in a turn and because it is about 25 
A from K15, a key amino acid in binding to trypsin. 

The underlying structure is most likely to be 
retained if insertions or deletions are made at loops or 
25 turns. 

V.T. Engineering of Antagonists 

It may be desirable to provide an antagonist to an 
enzyme or receptor. This may be achieved by making a 
molecule that prevents the natural substrate or agonist 
30 from reaching the active site. Molecules that bind 
directly to the active site may be either agonists or 
antagonists. Thus we adopt the following strategy. We 
consider enzymes and receptors together under the designa- 
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tion TER (Target Enzyme or Receptor) . 

For most TERs, there exist chemical inhibitors that 
block the active site. Usually, these chemicals are 
useful only as research tools due to highly toxicity. We 
5 make two affinity matrices: one with active TER and one 
with blocked TER. We make a variegated population of 
GP(PBD)s and select for SBPs that bind to both forms of 
the enzyme, thereby obtaining SDPs that do not bind to the 
active site. We expect that SBDs will be found that bind 

10 different places on the enzyme surface. Pairs of the sbd 
genes are fused with an intervening peptide segment. For 
example, if SBD-1 and SBD-2 are binding domains that show 
high affinity for the target enzyme and for which the 
binding is non-competitive, then the gene sbd-1: t linker: :- 

15 sbd-2 encodes a two-domain protein that will show high 
affinity for the target. We make several fusions having a 
variety of SBDs and various linkers. Such compounds have 
a reasonable probability of being an antagonist to the 
target enzyme. 

20 VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND 
CORRESPONDING DNAS 
VI, A. Generally 

Using the method of the present invention, we can 
obtain a replicable genetic package that displays a novel 

25 protein domain having high affinity and specificity for a 
target material of interest. Such a package carries both 
amino-acid embodiments of the binding protein domain and a 
DNA embodiment of the gene encoding the novel binding 
domain. The presence of the DNA facilitates expression of 

30 a protein comprising the novel binding protein domain 
within a high-level expression system, which need not be 
the same system used during the developmental process. 

VI. B. Production of Novel Binding Proteins 
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We can proceed to production of the novel binding 
protein in several ways; including: a) altering of the 
gene encoding the binding domain so that the binding 
domain is expressed as a soluble protein, not attached to 
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a genetic package (either by deleting codons 5 1 of those 
encoding the binding domain or by inserting stop codons 3 1 
of those encoding the binding domain) , b) moving the DNA 
encoding the binding domain into a known expression 
5 system, and c) utilizing the genetic package as a purifi- 
cation system. (If the domain is small enough, it may be 
feasible to prepare it by conventional peptide synthesis 
methods . ) 

Option (c) may be illustrated as follows. Assume 

10 that a novel BPTI derivative has been obtained by selec- 
tion of M13 derivatives in which a population of BPTI- 
derived domains are displayed as fusions to mature coat 
protein. Assume that a specific protease cleavage site 
( e.g. that of activated clotting factor X) is engineered 

15 into the amino-acid sequence between the carboxy terminus 
of the BPTI-derived domain and the mature coat domain. 
Furthermore, we alter the display system to maximize the 
number of fusion proteins displayed on each phage. The 
desired phage can be produced and purified, for example by 

20 centrifugation, so that no bacterial products remain. 
Treatment of the purified phage with a catalytic amount of 
factor X cleaves the binding domains from the phage 
particles. A second centrifugation step separates the 
cleaved protein from the phage, leaving a very pure 

25 protein preparation. 

VI. C. Mini-Protein Production 

As previously mentioned, an advantage inhering from 
the use of a mini-protein as an IPBD is that it is likely 
that the derived SBD will also behave like a mini-protein 
30 and will be obtainable by means of chemical synthesis. 
(The term "chemical synthesis", as used herein, includes 
the use of enzymatic agents in a cell-free environment.) 

It is also to be understood that mini-proteins 
obtained by the method of the present invention may be 
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taken as lead compounds for a series of homologues that 
contain non-naturally occurring amino acids and groups 
other than amino acids. For example, one could synthesize 
a series of homologues in which each member of the series 
5 has one amino acid replaced by its D enantiomer. One 
could also make homologues containing constituents such as 
£ alanine, aminobutyric acid, 3-hydroxyproline, 2-Aminoad- 
ipic acid, N-ethylasperagine, norvaline, etc . ; these would 
be tested for binding and other properties of interest, 
10 such as stability and toxicity. 

Peptides may be chemically synthesized either in 
solution or on supports. Various combinations of stepwise 
synthesis and fragment condensation may be employed. 

During synthesis, the amino acid side chains are 
15 protected to prevent branching. Several different 
protective groups are useful for the protection of the 
thiol groups of cysteines: 

1) 4-methoxybenzyl (MBzl; Mob) (NISH82; ZAFA88) , remov- 
able with HF; 

20 2) acetamidomethyl (Acm) (NISH82 ; NISH86; BECK89c) , 
removable with iodine; mercury ions ( e.g. , mercuric 
acetate) ; silver nitrate; and 
3) S-para-methoxybenzyl (HOUG84) . 

Other thiol protective groups may be found in 
25 standard reference works such as Greene, PROTECTIVE GROUPS 
IN ORGANIC SYNTHESIS (1981). 

Once the polypeptide chain has been synthesized, 
disulfide bonds must be formed. Possible oxidizing 
agents include air (H0UG84; NISH86) , ferricyanide (NISH82; 
30 HOUG84) , iodine (NISH82) , and performic acid (HOUG84) . 
Temperature, pH, solvent, and chaotropic chemicals may 
affect the course of the oxidation* 
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biologically active form: conotoxin Gl (13AA, 4 Cys) (NISH- 
82); heat-stable enterotoxin ST (18AA, 6 Cys) (HOUG84) ; 
analogues of ST (BHAT86) ; n-conotoxin GVIA (27AA, €Cys) 
(NISH86; RIVI87b) ; H-conotoxin MVIIA (27 AA, 6 Cys) 
5 (OLIV87b) ; a-conotoxin SI (13 AA, 4 Cys) (ZAFA88) ; \l- 
conotoxin Ilia (22AA, 6 Cys) (BECK89C, CRUZ89, HATA90) * 
Sometimes, the polypeptide naturally folds so that the 
correct disulfide bonds are formed. Other times, it must 
be helped along by use of a differently removable protec- 
10 tive group for each pair of cysteines* 

VI. D. Uses of Novel Binding Proteins 

The successful binding ' domains of the present 
invention may, alone or as part of a larger protein, be 
used for any purpose for which binding proteins are 
15 suited, including isolation or detection of target 
materials. In furtherance of this purpose, the novel 
binding proteins may be coupled directly or indirectly, 
covalently or noncovalently , to a label, carrier or 
support . 

20 When used as a pharmaceutical, the novel binding 

proteins may be contained with suitable carriers or 
adjuvanants. 

***** 

All references cited anywhere in this specification 
25 are incorporated by reference to the extent which they may 
be pertinent. 
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EXAMPLE I 

DISPLAY OF BPTI AS A FUSION TO M13 GENE VIII PROTEIN: 

Example I involves display of 3PTI on M13 as a fusion 
to the mature gene VIII coat protein* Each of the DNA 
5 constructions was confirmed by restriction digestion 
analysis and DNA sequencing, 

1. Construction of the viii-sianal-seouence: :bpti: :mature- 
viii-coat-protein Display Vector, 
A. Operative cloning vectors (OCT) . 

10 The operative cloning vectors are Ml 3 and phagemids 

derived from M13 or fl. The initial construction was in 
the fl-based phagemid pGEM-3Zf («) (™) (Promega Corp., 
Madison, WI. ) • 

A gene comprising, in order, : i) a modified lacUV5 

15 promoter, ii) a Shine-Dalgarno sequence, iii) DNA encoding 
the M13 gene VIII signal sequence, iv) a sequence encoding 
mature BPTI, v) a sequence encoding the mature-M13-gene- 
VIII coat protein, vi) multiple stop codons, and vii) a 
transcription terminator, was constructed. This gene is 

20 illustrated in Tables 101-105; each table shows the same 
DNA sequence "with different features annotated. There are 
a number of differences between this gene and the one 
proposed in the hypothetical example in the generic 
specification of the parent application. Because the 

25 actual construction was made in pGEM-3Zf(-), the ends of 
the synthetic DNA were made compatible with Sai l and 
BamHI. The lacO operator of lacUVS was changed to the 
symmetrical lacO with the intention of achieving tighter 
repression in the absence of IPTG. Several silent codon 

30 changes were made so that the longest segment that is 
identical to wild-type gene VIII is minimized so that 
genetic recombination with the co-existing gene VIII is 
unlikely. 
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i) OCV based upon pGEM-3Zf . 

pGEM-3Zf(™) (Promega Corp., Madison, WI.) is a 
plasmid-based vector containing the amp gene, bacterial 
origin of replication, bacteriophage fl origin of replica- 
5 tion, a lacZ operon containing a multiple cloning site 
sequence, and the T7 and SP6 polymerase binding sequences. 

Two restriction enzyme recognition sites were intro- 
duced, by site-directed oligonucleotide mutagenesis, at 
the boundaries of the lacZ operon. This allowed for the 

10 removal of the lacZ operon and its replacement with the 
synthetic gene. A BamHI recognition site (GGATCC) was 
introduced at the 5 1 end of the lacZ operon by the 
mutation of bases C 331 and T 332 to G and A respectively 
(numbering of Promega) . A Sai l recognition site (GTCGAC) 

15 was introduced at the 3 1 end of the operon by the mutation 
of bases C 302 i and T 3023 to G and C respectively , A 
construct combining these variants of pGEM-3Zf was 
designated pGEM-MB3/4. 

ii) OCV based upon M13mpl8„ 

20 M13mpl8 (YANI85) is an M13 bacteriophage-based vector 

(available from, inter alia, New England Biolabs, Beverly, 
" MA.) consisting of the whole of the phage genome into 
which has been inserted a lacZ operon containing a 
multiple cloning site sequence (MESS77) . Two restriction 

25 enzyme sites were introduced into M13mpl8 using standard 
methods. A BamHI recognition site (GGATCC) was introduced 
at the 5 1 end of the lacZ operon by the mutation of bases 
c 6003 and G 6004 to A and T respectively (numbering of 
Messing) . This mutation also destroyed a unique Narl 

30 site. A Sai l recognition site (GTCGAC) was introduced at 
the 3 1 end of the operon by the mutation of bases A 6 4 30 
and C 6432 to C and A respectively. A construct combining 
these variants of M13mpl8 was designated M13-MB1/2. 
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B) Synthetic Gene. 

A synthetic gene ' ( Vlll-signal-sequence ; : mature- 
bpti : : mature-VllI-coat-protein ) was constructed from 16 
synthetic oligonucleotides (Table 105) , custom synthesized 
5 by Genetic Designs Inc. of Houston, Texas, using methods 
detailed in KIMH89 and ASHM89 . Table 101 shows the DNA 
sequence; Table 102 contains an annotated version of this 
sequence* Table 103 shows the overlaps of the synthetic 
oligonucleotides in relationship to the restriction sites 

10 and coding sequence* Table 104 shows the synthetic DNA in 
■ double-stranded form* Table 105 shows each of the 16 syn- 
thetic oligonucleotides from S'-to-S*. The oligonucleoti- 
des were phosphorylated, with the exception of the 5 1 most 
molecules, using standard methods, annealed and ligated in 

15 stages such that a final synthetic duplex was generated. 
The overhanging ends of this duplex was filled in with T4 
DNA polymerase and it was cloned into the Hin di site of 
pGEM-3Zf (-) ; the initial construct is called pGEM-MBl 
(Table 101a) . Double-stranded DNA of pGEM-MBl was cut 

20 with PstI, filled in with T4 DNA polymerase and ligated to 
a Sai l linker (New England BioLabs) so that the synthetic 
gene is bounded by BamH I and Sai l sites (Table 101b and 

" " Table "102b) . The synthetic gene was obtained on a BamHI- 
Sail cassette and cloned into pGEM-MB3/4 and M13-MB1/2 

25 utilizing the BamH I and Sai l sites previously introduced, 
to generate the constructs designated pGEM-MB16 and M13- 
MB15, respectively* The full length of the synthetic 
insert was sequenced and found to be unambiguously correct 
except for: 1) a missing G in the Shine-Dalgarno sequence; 

30 and 2) a few silent errors in the third bases of some 
codons (shown as upper case in Table 101) . Table 102 
shows the Ribosome-binding site A204GGAGG but the actual 
sequence is A 10 4 GAGG « Efforts to express protein from 
this construction, in vivo and in vitro, were unavailing. 
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C) Alterations to the synthetic gene. 

i) Ribosome binding site (RBS). 

Starting with the construct pGEM-MB16, a fragment of 
DNA bounded by the restriction enzyme sites SacI and Nhe l 
(containing the original RBS) was replaced with a syn- 
thetic oligonucleotide duplex (with compatible SacI and 
Nhe l overhangs) containing the sequence for a new RBS that 
is very similar to the RBS of EL. coli phoA and that has 
been shown to be functional. 

Original putative RBS (S'-to-S') 

GAGCTCagaggCTTACTMSAAGAAATCTCTGGTTCTTAAGGCTAGC 
| SacI | 1 Nhe I | 

New RBS (S'-to-a 1 ) 

GAGCTCTggaggaAATAAAATGAAGAAATCTCTGGTTCTTAAGGCTAGC 
| SacI 1 | Nhe I | 

The putative RBSs above are lo^er case and the initiating 
methionine codon is underscored and bold. The resulting 
construct was designated pGEM-MB2 0. In vitro expression 
of the gene carried by pGEM-MB20 produced a novel protein 
species of the expected size, about 14.5 kd. 

~ii)" tac promoter. 

In order to obtain higher expression levels of the 
fusion protein, the lacUVS promoter was changed to a tac 
promoter. Starting with the construct pGEM-MB16, which 
contains the lacUVS promoter, a fragment of DNA bounded by 
the restriction enzyme sites BamHI and Hoa ll was excised 
and replaced with a compatible synthetic oligonucleotide 
duplex containing the -35 sequence of the trp promoter, Cf 
RUSS82 . This converted the lacUVS promoter to a tac 
promoter in a construct designated pGEM-MB22, Table 112. 
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MB16 

5 1 - GATCC tctagagtcggc TTTACA ctttatgcttc(cg-gctcg. .-3 ' 
3'- G agatctcagccg aaatgt gaaatacgaag gc(cgagc. .-5 1 

-I L I -35 1 4 I 

BamH I Hpa ll 



MB22 insert 

5 1 - GATCC actccccatccccctg TTGACA attaatcat -3 1 
3 1 - G tgaggggtagggggac AACTGT taattagtagc-5 1 

J L 1 "351 J 

BamHI (H^all) 



Promoter and RBS_ variants of the fusion protein gene 
were constructed by basic DNA manipulation techniques to 
generate the following: 

Promoter RBS Encoded Protein. 

pGEM-MB16 lac old VIIIs .p. -BPTI-matureVIII 

pGEM-MB20 lac new 1 f 

PGEM-MB22 tac old 8 1 

pGEM-MB26 tac new 1 1 

The synthetic gene from variants pGEM-MB20 and pGEM- 
MB26 were recloned into the altered phage vector M13-MB1/2 
to generate the phage constructs designated M13-MB27 and 
M13-MB28 respectively.. 

iii. Signal Peptide Sequence - 

In vitro expression of the synthetic gene regulated 
by tac and the "new" RBS produced a novel protein of the 
expected size for the unprocessed protein (about 16 kd) . 
In vivo expression also produced novel protein of full 
size; no processed protein could be seen on phage or in 
cell extracts by silver staining or by Western analysis 
with anti-BPTI antibody. 

Thus we analyzed the signal sequence of the fusion. 
Table 106 shows a number of typical signal sequences. 
Charged residues are generally thought to be of great 
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importance and are shown bold and underscored. Each 
* signal sequence contains a long stretch of uncharged 
residues that are mostly hydrophobic; these are shown in 
lower case* At the right, in parentheses, is the length 
5 of the stretch of uncharged residues. We note that the 
fusions of gene VIII signal to BPTI and gene III signal to 
BPTI have rather short uncharged segments. These short 
uncharged segments may reduce or prevent processing of the 
fusion peptides. We know that the gene III signal 

10 sequence is capable of directing: a) insertion of the 
peptide comprising (mature-BPTI) : : (mature-gene-III- 
protein) into the -lipid bilayer, and b) translocation of 
BPTI and most of the mature gene III protein across the 
lipid bilayer ( vide infra). That the gene III remains 

15 anchored in the lipid bilayer until the phage is assembled 
is directed by the uncharged anchor region near the 
carboxy terminus of the mature gene III protein (see Table 
116) and not by the secretion signal sequence. The phoA 
signal sequence can direct secretion of mature BPTI into 

20 the periplasm of coli (MARK86) . Furthermore, there is 
controversy over the mechanism by which mature authentic 
gene VIII protein comes to be in the lipid bilayer prior 
to phage assembly; - - - 

Thus we decided to replace the DNA coding on expres- 

25 sion for the gene-VIII-putative-signal-sequence by each 
of: 1) DNA coding on expression for the phoA signal 
sequence, 2) DNA coding on expression for the bla signal 
sequence, or 3) DNA coding on expression for the M13 gene 
III signal. Each of these replacements produces a 

30 tripartite gene encoding a fusion protein that comprises, 
in order: (a) a signal peptide that directs secretion into 
the periplasm of parts (b) and (c) , derived from a first 
gene; (b) an initial potential binding domain (BPTI in 
this case) , derived from a second gene (in this case, the 

35 second gene is an animal gene) ; and (c)_ a structural 
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packaging signal (the mature gene VIII coat protein) , 
derived from a third gene; 

The process by which the IPBD: : packaging-signal 
fusion arrives on the phage surface is illustrated in 
5 Figure 1. In Figure la, we see that authentic gene VIII 
protein appears (by whatever process) in the lipid bilayer 
so that both the amino and carboxy termini are in the 
cytoplasm. Signal peptidase-I cleaves the gene VIII 
protein liberating the signal peptide (that is absorbed by 

10 the cell) and mature gene VIII coat protein that spans the 
lipid bilayer. "Many copies of mature gene VIII coat 
protein accumulate in the lipid bilayer awaiting phage 
assembly (Figure 1c) . Some signal sequences are able to 
direct the translocation of quite large proteins across 

15 the lipid bilayer. If additional codons are inserted 
after the codons that encode the cleavage site of the 
signal peptidase-I c£ such a potent signal sequence, the 
encoded amino acids will be- translocated across the lipid 
bilayer as shown in Figure lb. After cleavage by signal 

20 peptidase-I, the amino acids encoded by the added codons 
will be in the periplasm but anchored to the lipid bilayer 
by the mature gene VIII coat protein, Figure Id. The 
circular single-stranded phage DNA is extruded through a 
part of the lipid bilayer containing a high concentration 

25 of mature gene VIII coat protein; the carboxy terminus of 
each coat protein molecule packs near the DNA while the 
amino terminus packs on the outside. Because the fusion 
protein is identical to mature gene VIII coat protein 
within the trans-bilayer domain, the fusion protein will 

30 co-assemble with authentic mature gene VIII coat protein 
as shown in Figure le. 

In each case, the mature VIII coat protein moiety is 
intended to co-assemble with authentic mature VIII coat 
protein to produce phage particle having BPTI domains 



J 



207 

displayed on the surface . The source and character of the 
secretion signal sequence is not important because the 
signal sequence is cut away and degraded. The structural 
packaging signal, however, is quite important because it 
5 must co-assemble with the authentic coat protein to make a 
working virus sheath. 

a) Bacterial Alkaline Phosphatase ( phoA ) Signal Peptide, 

Construct pGEM-MB26 contains a fragment of DNA 
bounded by restriction enzyme sites SacI and AccIII which 

10 contains the new RBS and sequences encoding the initiating 
methionine and the signal peptide of Ml 3 gene VIII 
pro-protein. This fragment was replaced with a synthetic 
duplex (constructed from four annealed oligonucleotides) 
containing the RBS and DNA coding for the initiating 

15 methionine and signal peptide of PhoA (INOU82). The 
resulting construct was designated pGEM-MB42; the sequence 
of the fusion gene is shown in Table 1X3. M13MB48 is a 
derivative of GemMB42. A BamH I- Sal l DNA fragment from 
GenMB42, containing the gene construct, was ligated into a 

20 similarly cleaved vector M13MB1/2 giving rise to M13MB48. 

PhoA RBS and signal peptide sequence 

5 1 -GAGCTCCATGGGAGAAAATAAA . ATG . AAA . CAA . AGC . ACG . - 
| SacI | met lys gin ser thr 

25 . ATC • GCA . CTC . TTA . CCG . TTA . CTG . TTT . ACC . CCT . GTG . ACA . - 
ile ala leu leu pro leu leu phe thr pro val thr 

. AAA . GCC . CGT . CCG . GAT . -3 ' 

lys ala arg pro asp 

30 | AccIII 1 

b) beta-lactamase signal peptide . 

To enable the introduction of the beta-lactamase 
( amp ) promoter and DNA coding for the signal peptide into 
35 the gene encoding (mature-BPTI) ; s (mature-VIII-coat- 
protein) an initial manipulation of the amp gene (encoding 
beta-lactamase) was required. Starting with pGEM-3Zf an 
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AccIII recognition site (TCCGGA) was introduced into the 
amp gene adjacent to the* DNA sequence encoding the amino 
acids at the beta-lactamase signal peptide cleavage site. 
Using standard methods of in vitro site-directed oligo- 
5 nucleotide mutagenesis bases C 2 so4 and A 2501 were con- 
verted to T and G respectively to generate the construct 
designated pGEM-MB40. Further manipulation of pGEM-MB40 
entailed the insertion of a synthetic oligonucleotide 
linker (CGGATCCG) containing the BamHI recognition 

10 sequence (GGATCC) into the Aat ll site (GACGTC starting at 
nucleotide number 2260) to generate the construct desig- 
nated pGEM-MB45. The DNA bounded -by the restriction 
enzyme sites of BamH I and Acc III contains the amp promot- 
er, amp RBS, initiating methionine and beta-lactamase 

15 signal peptide. This fragment was used to replace the 
corresponding fragment from pGEM-MB2 6 to generate con- 
struct pGEM-MB46. 

amp gene promoter and signal peptide sequences 

20 

5 1 -GGATCCGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTT- 

TATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACC- 

25 CTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGT- 

ATG . AGT * ATT ♦ CAA . CAT . TTC . CGT . GTC . GCC . CTT . ATT . - 
met ser ile gin his phe arg val ala leu ile 

3 0 CCC . TTT . TTT . GCG . GCA . TTT . TGC . CTT . CCT . GTT . TTT . - 

pro phe phe ala ala phe cys leu pro val phe 

GCT.CAT.CCG,-3 1 

ala his pro. . . . 

35 

c) M13-gene-III-sicmal : : bpti ; ; mature-VIII-coat-protein 

We may also construct, as depicted in Figure 5, M13™ 
MB51 which would carry a gene encoding a fusion of M13- 
gene -1 1 1 -signal -peptide to the previously described 
40 BPTI::mature VIII coat protein* First the BstEII site 



209 

that follows the stop codons of the synthetic gene VIII is 
changed to an AlwN I site as follows • DNA of pGEM-MB2 6 is 
cut with BstE II and the ends filled in by use of Klenow 
enzyme; a blunt AlwN I linker is ligated to this DNA. This 
5 construction is called pGEM-MB26Alw. The Xhol to AlwN I 
fragment (approximately 300 bp) of pGEM-MB26Alw is 
purified. RF DNA from phage MK-BPTI ( vide infra ) is cut 
with AlwNI and Xho l and the large fragment purified. 
These two fragments are ligated together; the resulting 
10 construction is named M13-MB51. Because M13-MB51 contains 
no gene III , the phage can not form, plaques. M13-MB51 
can, however, render cells Km R . Infectious phage parti- 
cles can be obtained by use of helper phage. As explained 
below, the gene III signal sequence is capable of direct- 
15 ing (BPTI) : : (mature-gene-III-protein) to the surface of 
phage. In M13-MB51, we have inserted DNA encoding gene 
VIII coat protein (50 amino acids) and three stop codons 
5 s to the DNA encoding the mature gene III protein. 
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2. Analysis of the Protein Products Encoded by the 

Synthetic (sicmal-peptide; :mature-bpti: :viii-coat-protein) 

30 Genes 

i) ID vitro analysis 

A coupled transcription/translation prokaryotic 
system (Amersham Corp*, Arlington Heights, IL) was 
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utilized for the in vitro analysis of the protein products 
encoded by the BPTI/VIII * synthetic gene and the variants 
derived from this. 

Table 107 lists the protein products encoded by the 
5 listed vectors which are visualized by the standard method 
of fluorography following in vitro synthesis in the 
presence of 3 5 S -methionine and separation of the products 
using SDS polyacrylamide gel electrophoresis. In each 
sample a pre-beta-lactamase product (approximately 31 kd) 

10 can be seen. This is derived from the amp gene which is 
the common selection gene for each of the "vectors. In 
addition, a (pre-BPTI/VIII) product encoded by the 
synthetic gene and variants can be seen as indicated,. The 
migration of these species (approximately 14.5 kd) is 

15 consistent with the expected size of the encoded proteins. 

ii) In vivo analysis. 

The vectors detailed in sections (B) and (C) were 
freshly transfected into the coli strain XLl-blue 
(Stratagene, La Jolla, CA) and in strain SEF*. coli 

20 strain SE6004 (LISS85) carries the prlA4 mutation and is 
more permissive in secretion than strains that carry the 
wild-type prlA allele, SE6004 is-F" and is deleted -for 
lacl ; thus the cells can not be infected by M13 and lacUVS 
and tac promoters can not be regulated with IPTG. Strain 

25 SEF' is derived from strain SE6004 (LISS85) by crossing 
with XLl-Blue (™) ; the F f in XL1-Blue(™) carries Tc R and 
lacl 3. SE6004 is streptomycin 11 , Tc s while XL1-Blue(™) is 
streptomycin s , Tc R so that both parental strains can be 
killed with the combination of Tc and streptomycin. SEF 1 

30 retains the secretion-permissive phenotype of the parental 
strain, SE6004 (prlA4) . 

The fresh transfectants were grown in NZYCM medium 
(SAMB89) for 1 hour after which IPTG was added over the 
range of concentrations 1.0 juM to 0,5 iM (to derepress the 
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lacUVS and tac promoters) and grown for an additional 1.5 
hours. 

Aliquots of the bacterial cells expressing the 
synthetic insert encoded proteins together with the 
5 appropriate controls (no vector, vector with no insert and 
zero IPTG) were lysed in SDS gel loading buffer and 
electrophoresed in 20% polyacrylamide gels containing SDS 
and urea. Duplicate gels were either silver stained 
(Daiichi, Tokyo, Japan) or el ectrotransf erred to a nylon 
10 matrix (Immobilon from Millipore, Bedford, MA) for western 
analysis by standard means using rabbit anti-BPTI polyclo- 
nal antibodies. 

Table 108 lists the interesting proteins visualized 
on a silver stained gel and by western analysis of an 

15 identical gel. We can see clearly in the western analysis 
that protein species containing BPTI epitopes are present 
in the test strains whic>» are absent from the control 
strains and which are also IPTG inducible. In XL1- 
Blue(™), the migration of this species is predominantly 

20 that of the unprocessed form of the pro-protein although a 
small proportion of the encoded proteins appear to migrate 
at a size consistent with that of a fully processed form. 
In SEF', the processed form predominates, there being only 
a faint band corresponding to the unprocessed species. 

25 Thus in strain SEF 1 , we have produced a tripartite 

fusion protein that is specifically cleaved after the 
secretion signal sequence. We believe that the mature 
protein comprises BPTI followed by the gene VIII coat 
protein and that the coat protein moiety spans the 

30 membrane. We believe that it is highly likely that one or 
more copies, perhaps hundreds of copies, of this protein 
will co-assemble into M13 derived phage or M13-like 
phagemids. This construction will allow us to a) mutagen- 
ize the BPTI domain, b) display each of the variants on 
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the coat of one or more phage (one type per phage) , and c) 
recover those phage that display variants having novel 
binding properties with respect to target materials of our 
choice. 

5 Rasched and Oberer (RASC86) report that phage 

produced in cells that express two alleles of gene VIII r 
that have differences within the first 11 residues of the 
mature coat protein, contain some of each protein* Thus, 
because we have achieved in vivo processing of. the 

10 phoAf signal^ : : bpti ; : matureVIII fusion gene, it is highly 
likely that co-expression of this gene with wild-type VIII 
will lead to production of phage bearing BPTI domains on 
their surface. Mutagenesis of the bpti domain of these 
genes will provide a population of phage, each phage 

15 carrying a gene that codes for the variant of BPTI 
displayed on the phage surface. 

VIII Display Phage: Production, Preparation and Analysis. 

i- Phage Production . 

The OCV can be grown in XL1-Blue(™) in the absence 
20 of the inducing agent, IPTG. Typically, a plaque plug is 
taken from a plate and grown in 2 ml of medium, containing 
freshly diluted bacterial cells, for 6 to 8 hours . 
Following centrifugation of this culture the supernatant 
is taken and the phage titer determined. This is kept as 
25 a phage stock for further infection, phage production and 
display of the gene product of interest. 

A 100 fold dilution of a fresh overnight culture of 
SEP" bacterial cells in 500 ml of NZCYM medium is allowed 
to grow to a cell density of 0.4 (Ab 600nm) in a shaker 
30 incubator at 37 °c. To this culture is added a sufficient 
amount of the phage stock to give a MOI of 10 together 
with IPTG to give a final concentration of 0.5 mM. The 
culture is allowed to grow for a further 2 hrs. 
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ii. Phage Preparation and Purification. 

The phage producing bacterial culture is centrifuged 
to separate the phage in the supernatant from the bacter- 
ial pellet. To the supernatant is added one quarter by 
5 volume of phage precipitation solution (20% PEG, 3*75 M 
ammonium acetate) and PMSF to a final concentration of 
ImM. It is left on ice for 2 hours after which the 
precipitated phage is retrieved by centrifugation. The 
phage pellet is redissolved in TrisEDTA containing ,0.1% 
10 Sarkosyl and left at 4°C for 1 hour after which any 
bacteria and bacterial debris is removed by centrifuga- 
tion . The phage in the supernatant is reprecipitated with 
PEG overnight at 4°C. The phage pellet is resuspended in 
LB medium and repreciptated another two times to remove 
;f 15 the detergent. The phage is stored in LB medium at 4°C, 

j titered and used for analysis and binding studies. 

: ; A more stringent phage purification scheme involves 

J centrifugation in a CsCl gradient. 3.86 g of CsCl is 

dissolved in NET buffer (0.1 M NaCl, ImM EDTA, 0.1M Tris 
} 20 pH 7.7) upto a volume of 10 ml. 10 12 to 10 13 phage in TE 

Sarkosyl buffer a re mixed with 5 ml of CsCl NET buffer and 
transferred to a sealable ultracentrifuge tube. Centrifu- 
gation is performed overnight at 34K rpm in a Sorvall OTD- 
65B Ultracentrifuge. The tubes are opened and 400 /il 
25 aliqouts are carefully removed. 5 jtil aliqouts are 
removed from the fractions and analysed by agarose gel 
electrophoresis after heating at 65 °C for 15 minutes 
together with the gel loading buffer containing 0.1% SDS. 
Fractions containing phage are pooled, the phage reprecip- 
30 itated and finally redissolved in LB medium to a concen- 
tration of 10 12 to 10 13 phage per ml. 

iii» Phage Analysis- 

The display phage, together with appropriate controls 
are analyzed using standard methods of polyacrylamide gel 
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electrophoresis and either silver staining of the gel or 
electrotransfer to a nylon .matrix followed by analysis 
with anti-BPTI antiserum (Western analysis) . Quantitation 
of the display of heterologous proteins is achieved by 
running a serial dilution of the starting protein, for 
example BPTI, together with the display phage samples in 
the electrophoresis and Western analyses described above* 
An alternative method involves running a 2 fold serial 
dilution of a phage in which both the major coat protein 
and the fusion protein are visualized by silver staining • 
A comparison of the relative ratios of the two protein 
species allows one to estimate the number of fusion 
proteins per phage since the number of VIII gene encoded 
proteins per phage (approximately 3000) is known. 

Incorporation of fusion protein into bacteriophage. 

In vivo expression of the processed BPTI: VIII fusion 
protein, encoded by vectors GemMB42 (above and Table 113) 
and M13MB48 (above) , implied Uiat the processed fusion 
product was likely to be correctly located within the 
bacterial cell membrane* This localization made it 
possible that it could be incorporated into the phage and 
that the BPTI moiety would be displayed at the bacteri- 
ophage surface* 

SEF* cells were infected with either M13MB48 (con- 
sisting of the starting phage vector M13mpl8, altered as 
described above , containing the synthetic gene consisting 
of a tac promoter, functional ribosome binding site, phoA 
signal peptide, mature BPTI and mature major coat protein) 
or M13mpl8, as a control. Phage infections, preparation 
and purification was performed as described in Example 
VIII. 

The resulting phage were electrophoresed (approxim- 
ately 10 11 phage per lane) in a 20% polyacryl amide gel 
containing urea followed by electrotransfer to a nylon 
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matrix and western analysis using anti-BPTI rabbit serum. 
A single species of protein was observed in phage derived 
from infection with the M13MB48 stock phage which was not 
observed in the control infection. This protein had a 
5 migration of about 12 kd, consistent with that of the 
fully processed fusion protein. 

Western analysis of SEF 1 bacterial lysate with or 
without phage infection demonstrated another species of 
protein of about 20kd. This species was also present, to 

10 a lesser degree, in phage preparations which were simply 
PEG precipitated without further purification (for 
example, using nonionic detergent or by CsCl gradient 
centrifugation) . A comparison of M13MB48 phage progof f 
eparations made in the presence or absence of detergent 

15 aldemonstrated that sarkosyl treatment and CsCl gradient 
purification did remove the bacterial contaminant while 
having no effect on the presence of the BPTIrVIII fusion 
protein. This indicates that the fusion protein has been 
incorporated and is a constituent of the phage body. 

20 The time course of phage production and BPTI:VIII 

incorporation was followed post-infection and after IPTG 
induction. Phage production and fusion protein incorpora- 
tion appeared to be maximal after two hours. This time 
course was utilized in further phage productions and 

25 analyses. 

Polyacryl amide electrophoresis of the phage prepara- 
tions, followed by silver staining, demonstrated that the 
preparations were essentially free of contaminating 
protein species and that an extra protein band was present 
30 in M13MB48 derived phage which was not present in the 
control phage. The size of the new protein was consistent 
with that seen by western analysis, A similar analysis of 
a serially diluted BPTI:VIII incorporated phage demon- 
strated that the ratio, of fusion protein to major coat 
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protein was typically in the range of 1:150. Since the 
phage is known to contain in the order of 3000 copies of 
the gene VIII product, this means that the phage popula- 
tion contains, on average, 10* s of copies of the fusion 
5 protein per phage. 

Altering the initiating methionine of the natural gene 
VIII. 

The OCV M13MB48 contains the synthetic gene encoding 
the BPTI:VIII fusion protein in the intergenic region of 

10 the modified M13mpl8 phage vector* The remainder of the 
vector consists of the M13 genome which contains the genes 
necessary for various bacteriophage functions, such as DNA 
replication and phage formation etc* In an attempt to 
increase the phage incorporation of the fusion protein, we 

15 decided to try to diminish the production of the natural 
gene VIII product, the major coat protein, by altering the 
codon for the initiating methionine of this gene to one 
encoding leucine. In such cases, methionine is actually 
incorporated, but the rate of initiation is reduced. The 

20 change was achieved by standard methods of site-specific 
oligonucleotide mutagenesis as follows, 

M K K S -rest of VIII 
ACT , TCC . TC . ATG . AAA . AAG • TCT ♦ 
25 rest of XI - T S S stop 

Site-specific mutagenesis. 

(L) K K S -rest of VIII 
30 ACT.TCC.AG.CTG.AAA.AAG.TCT. 
rest of XI - T S S stop 

Note that the 3 1 end of the XI gene overlaps with the 

5 8 end of the VIII gene. Changes in DNA sequence were 

designed such that the desired change in the VIII gene 

35 product could be achieved without alterations to the 

predicted amino acid sequence of the gene XI product. A 

diagnostic Pvu II recognition site was introduced at this 
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site* 

It was anticipated that initiation of the natural 
gene VIII product would be hindered, enabling a higher 
proportion of the fusion protein to be incorporated into 
5 the resulting phage. 

Analyses of the phage derived from this modified 
vector indicated that there was a significant increase in 
the ratio of fusion protein to major coat protein* 
Quantitative estimates indicated that within a phage 
10 population as much as 100 copies of the BPTI:VIII fusion 
were incorporated per phage* 

Incorporation of interdomain extension fusion proteins 
into phage. 

A phage pool containing a variegated pentapeptide 
15 extension at the BPTItcoat protein interface (see Example 
VII) was used to infect SEF f cells* IPTG induction , phage 
production and preparation were as described in Example 
VIII* Using the criteria detailed in the previous 
section, it was determined that extended fusion proteins 
20 were incorporated into phage. Gel electrophoresis of the 
generated phage, followed by either silver staining or 
western analysis with anti-BPTI rabbit serum, demonstrated 
fusion proteins that migrated similarly to but discernably 
slower that of the starting fusion protein. 

25 With regard to the 1 EGGGS linker 1 extensions of the 

domain interface, individual phage stocks predicted to 
contain one or more 5-amino-acid unit extensions were 
analyzed in a similar fashion* The migration of the 
extended fusion proteins were readily distinguishable from 

30 the parent fusion protein when viewed by western analysis 
or silver staining* Those clones analyzed in more detail 
included M13.3X4 (which contains a single inverted EGGGS 
linker with a predicted amino acid sequence of GGGSL) , 
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M13.3X7 (which contains a correctly orientated linker with 
a predicted amino acid sequence of EGGGS) , M13.3X11 (which 
contains 3 linkers with an inversion and a predicted amino 
acid sequence for the extension of EGGGSGSSSJjGSSSL) and 
5 M13.3Xd which contains an extension consisting of at least 
5 linkers or 25 amino acids. 

The extended fusion proteins were all incorporated 
into phage at high levels (on average 10 ! s of copies per 
phage were present and when analyzed by gel electrophor- 
10 esis migrated rates consistent with the predicted size of 
the extension. Clones M13.3X4 and M13.3X7 migrated at a 
position very similar to but discernably different from 
the parent fusion protein, while M13.3X11 and M13.3Xd were 
J5 markedly larger. 

^ 15 Display of BPTI:VTII fusion protein by bacteriophage • 

lAl The BPTI:VIII fusion protein had been shown to be 

;f: incorporated into the body of the phage. This phage was 

m analyzed further* to demonstrate that the BPTI moiety was 

s accessible to specific antibodies and hence displayed at 

y 20 the phage surface. 

ill The assay is detailed in section EE, but principally 

involves the addition of purified anti-BPTI IgG (from the 

J serum of BPTI injected rabbits) to a known titer of phage* 

Following incubation, protein A-agarose beads are added to 
25 bind the IgG and left to incubate overnight. The IgG- 
protein A beads and any bound phage are removed by 
centrifugation followed by a retitering of the supernatant 
to determine any loss of phage. The phage bound to the 
beads can be acid eluted and titered also. Appropriate 
30 controls are included in the assay, such as a wild type 
phage stock (Ml3mpl8) and IgG purified from normal rabbit 
pre-immune serum* 

Table 140 shows that while the titer of the wild type 
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phage is unaltered by the presence of anti-BPTI IgG, BPTI- 
IIIMK (the positive control for the assay) , demonstrated a i 
significant drop in titer with or without the extra 
addition of protein A beads* (Note that since the BPTI 
5 moiety is part of the III gene product which is involved 
in the binding of phage to bacterial pili, such a phenom- 
enon is entirely expected*) Two batches of M13HB48 phage 
(containing the BPTI:VIII fusion protein) demonstrated a 
significant reduction in titer, as judged by plaque 

10 forming units, when anti-BPTI antibodies and protein A 
beads were added to the phage. The initial drop in titer 
with the antibody alone, differs somewhat between the two 
batches of phage* This may be a result of experimental or 
batch variation* Retrieval of the immunoprecipitated 

15 phage, while not quantitative, was significant when 
compared to the wild type phage control. 

Further control experiments relating to this section 
are shown in Table 141 and Table 142. The data demon- 
strated that the loss in titer observed for the BPTI: VIII 

2C containing phage is a result of the display of BPTI 
epitopes by these phage and the specific interaction with 
anti-BPTI antibodies. No significant interaction with 
either protein A agarose beads or IgG purified from normal 
rabbit serum could be demonstrated. The larger drop in 

25 titer for M13MB48 batch five reflects the higher level 
incorporation of the fusion protein in this preparation. 

Functionality of the BPTI moiety in the BPTX-VIII display 

phage. 

The previous two sections demonstrated that the 
30 BPTI: VIII fusion protein has been incorporated into the 
phage body and that the BPTI moiety is displayed at the 
phage surface. To demonstrate that the displayed molecule 
is functional, binding experiments were performed in a 
manner almost identical to that described in the previous 
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section except that proteases were . used in place of 
antibodies* The display phage, together with appropriate 
controls, are allowed to interact with immobilized 
proteases or immobilized inactivated proteases. Binding 
5 can be assessed by monitoring the loss in titer of the 
display phage or by determining the number of phage bound 
to the respective beads. 

Table 143 shows the results of an experiment in which 
BPTI.VIII display phage, M13MB48, were allowed to bind to 

10 anhydrotrypsin-agarose beads. There was a significant 
drop in titer when compared to wild type phage, which do 
not display BPTI. A pool of phage (5AA Pool), each 
contain a variegated 5 amino acid extension at the 
BPTI major coat protein interface, demonstrated a similar 

15 decline in titer* In a control experiment (table 143) 
very little non-specific binding of the above display 
phage was observed with agarose beads to which an unre- 
lated protein (streptavidin) is attached. 

Actual binding of the display phage is demonstrated 
20 by the data shown for two experiments in Table 144. " The 
negative control is wild type M13mpl8 and the positive 
control is BPTI-IIIMK, a phage in which the BPTI moiety, 
attached to the gene III protein, has been shown to be 
displayed and functional. M13MB48 and M13MB56 both bind 
25 to anhydrotrypsin beads in a manner comparable to that of 
the positive control, being 40 to 60 times better than the 
negative control (non-display phage) . Hence functionality 
of the BPTI moiety, in the major coat fusion protein, was 
established. 

30 To take this analysis one step further, a comparison 

of phage binding to active and inactivated trypsin is 
shown in Table 145. The control phage, M13mpl8 and BPTI- 
III MK, demonstrated binding similar to that detailed in 
Example III. Note that the relative binding is enhanced 
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with trypsin due to the apparent marked reduction in the 
non-specific binding of the wild type phage to the active 
protease ♦ M13.3X7 and M13.3X11, which both contain 
1 EGGGS 1 linker extensions at the domain interface, bound 
5 to anhydrotrypsin and trypsin in a manner similar to BPTI- 
IIIMK phage. The binding, relative to non-display phage, 
was approximately 100 fold higher in the anhydrotrypsin 
binding assay and at least 1000 fold higher in the trypsin 
binding assay. The binding of another 1 EGGGS 1 linker 
10 variant (M13.3Xd) was similar to that of M13.3X7. 

To demonstrate the specificity of binding the assays 
were repeated with human neutrophil elastase (HNE) beads 
and compared to that seen with trypsin beads Table 146. 
BPTI has a very high affinity for trypsin and a low 

15 affinity for HNE, hence the BPTI display phage should 
reflect these affinities when used in binding assays with 
these beads „ The negative and positive controls for 
trypsin binding were as already described above while an 
additional positive control for the HNE beads, BPTI(K15L,- 

20 MGNG)-IIi MA (see Example III) was included. The results, 
shown in Table 146, confirmed this prediction. M13MB48, 
M13.3X7 and M13.3X11 phage demonstrated good binding to 
- trypsin 7 relative to wild _ type phage and the HNE control 
(BPTI (K15L,MGNG) -III MA), being comparable to BPTI-IIIMK 

25 phage. Conversely poor binding occurred when HNE beads 
were used, with the exception of the HNE positive control 
phage. 

Taken together the accumulated data demonstrated that 
when BPTI is part of a fusion protein with the major coat 
30 protein of M13 phage, the molecule is both displayed at 
the surface of the phage and a significant proportion of 
it is functional in a specific protease binding manner. 



r 6 



i 



222 
EXAMPLE II 

CONSTRUCTION OF BPTI/GENE-III DISPLAY VECTOR 

DNA manipulations were conducted according to 
standard procedures as described in Maniatis et aL 
5 (MANI82). First the unwanted lacZ gene of M13-MB1/2 was 
removed* M13-MB1/2 RF was cut with BamHI and Sai l and the 
large fragment was isolated by agarose gel electrophor- 
esis* The recovered 6819 bp fragment was filled in with 
Klenow fragment of E*. coli DNA polymerase and ligated to a 

10 synthetic Hindlll 8mer linker (CAAGCTTG) * The ligation 
sample was used to transfect competent XLl-Blue^™) 
(Stratagene, La Jolla, CA) cells which were subsequently 
platdd for plague formation* RF DNA was prepared from 
chosen plagues and a clone, M13-MBl/2-delta, containing 

15 regenerated BamH I and Sail sites as well as a new Hindlll 
site, all 500 bp upstream of the Bal l I site (6935) was 
picked* 

A unique Nar l site was introduced into codons 17 and 
13 of gene III (changing the amino acids from H-S to G-A, 

20 Cf . Table 110) * 10 6 phage produced from bacterial celi^ 
harboring the Ml3-MBl/2-delta RF DNA were used to infect a 
culture of CJ236 cells (relevant genotype: F 1 , dutl , ungl, 
Cm R ) (0D595=0*35) • Following overnight incubation at 
37 *C, phage were recovered and uracil-containing ss DNA 

25 was extracted from phage in accord with the instructions 
for the MUTA-GENE( R ) M13 in vitro Mutagenesis Kit (Catalo- 
gue Number 170-3571, Bio-Rad, Richmond, CA) . Two hundred 
nanograms of the purified single stranded DNA was annealed 
to 3 picomoles of a phosphorylated 25mer mutagenic oligo- 

30 nucleotide, 

5 ' -gtttcagcggCgCCagaatagaaag-3 ' , 

where upper case indicates the changes) . Following 
filling in with T4 DNA polymerase and ligation with T4 DNA 
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ligase, the reaction sample was used to transfect com- 
petent XL1-Blue(™> cells which were subsequently plated 
to permit the formation of plaques. 

RF .DNA, isolated from phage-infected cells which had 
5 been allowed to propagate in liquid culture for 8 hours, 
was denatured, spotted on a Nytran membrane, baked and 
hybridized to the 25mer mutagenic oligonucleotide which 
had previously been phosphorylated with 32 P-ATP. Clones 
exhibiting strong hybridization signals at 70 °C (6°C less 

10 than the theoretical Tm of the mutagenic oligonucleotide) 
were chosen for large scale RF preparation. The presence 
of a unique Narl site at nucleotide 1630 was confirmed by 
-restriction enzyme analysis. The resultant RF DNA, M13- 
MBl/2-delta-NarI was cut with BamHI, dephosphorylated with 

15 calf intestinal phosphatase, and ligated to a 1.3 Kb BamHI 
fragment, encoding the kanamycin-resistance gene (lean) , 
derived from plasmid pUC4K (Pharmacia, Piscataway, NJ) . 
i'he ligation sample was used to transfect 00*5,2 cent 
XLl-Biue'* irH > cells whioh were subsequently plated onto LB 

20 plates containing kanamyc:.n (Km). RF DNA prepared from 
Km R colonies was prepared and subjected to restriction 
enzyme analysis to confirm the insertion of kan into M13- 
MBl/2-delta-NarI DNA thereby creating the phage MK. Phage 
MK grows as well as wild-type M13, indicating that the 

25 changes at the cleavage site of gene III protein are not 
detectably deleterious to the phage. 

INSERTION OF SYNTHETIC BPTI GENE 

The construction of the BPTI-III expression vector is 
shown in Figure 6. The synthetic bpti-VIII fusion 
30 contains a Nar l site that comprises the last two codons of 
the BPTI-encoding region. A second Narl site was intro- 
duced upstream of the BPTI-encoding region as follows. RF 
DNA of phage M13-MB26 was cut with AccIII and ligated to 
the dsDNA adaptor: 




5 » -TATTCTGGCGCCCGT -3 ' 

3 ■ -ATAAGACCGCGGGCAGGCC-5 1 

1 Narl | | AccIII 



5 The ligation sample was subsequently restricted with Narl 
and a 180 bp DNA fragment encoding BPTI was isolated by 
agarose gel electrophoresis* RF DNA of phage MK was 
digested with Nar l, dephosphorylated with calf intestinal 
phosphatase and ligated to the 180 bp fragment. Ligation 

10 samples were used to transfect competent XL1-Blue(™) 
cells which were plated to enable the formation of 
plaques* DNA, isolated from phage derived from plaques, 
was denatured, applied to a Nytran membrane, baked and 
hybridized to a 32 P-phosphorylated double stranded DNA 

15 probe corresponding to the BPTI gene. Large scale RF 
preparations were made for clones exhibiting a strong 
hybridization signal. Restriction enzyme digestion 
analysis confirmed the insertion of a single copy of the 
synthetic BPTI gene ir-to gene III of MK to generate phage 

20 MK-BPTI. Subsequent DNA sequencing confirmed tX^t tUc 
sequence of the bpti-III fusion gene is correct and that 
rhe correct reading frame is maintained (Table 111) . 
Table 116 shows the entire coding region, the translation 
into protein sequence, and the functional parts of the 

25 polypeptide chain". 

EXPRESSION OF THE BPTI-III FUSION GENE IN VITRO 

MK-BPTI RF DNA was added to a coupled prokaryotic 
transcript ion- translation extract (Amersham) . Newly 
synthesized radiolabelled proteins were produced and 

30 subsequently separated by electrophoresis on a 15% SDS- 
polyacrylamide gel subjected to f luorography. The MK- 
BPTI DNA directs the synthesis of an unprocessed gene III 
fusion protein which is 7 Kd larger than the gene III 
product encoded by MK. This is consistent with the 

35 insertion of 58 amino acids of BPTI into the gene III 
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protein, Iinxaunoprecipitation . of radiolabelled proteins 
generated by the cell-free prokaryotic extract was 
conducted* Neither rabbit anti(M13-gene-VIII-protein) IgG 
nor normal rabbit IgG were able to inmiunoprecipitate the 
gene III protein encoded by either MK or MK-BPTI. 
However, rabbit anti-BPTI IgG is able to immunoprecipitate 
the gene III protein encoded by MK-BPTI but not by MK* 
This confirms that the increase in size of the III protein 
encoded by MK-BPTI is attributable to the insertion of the 
BPTI protein* 

WESTERN ANALYSIS 

Phage were recovered from bacterial cultures by PEG 
precipitation. To remove residual bacterial cells, 
recovered phage were resuspended in a high salt buffer and 
subjected to centrifugation, in accord with the instruc- 
tions for the MUTA-GENE( R ) M13 in vitro Mutagenesis Kit 
(Catalogue Number 170-3571, Bio-Rad, Rj r-ajnond, CA) . 
A liquet 3 of pbeerc fcrntai;iir« np t~ 40 &q of protein) wer^ 
subjected to electrophoresis on a 12 •5^ SDS-urea-poly- 
acrylamide gel and proteins were transferred to & sheet of 
Immobilon by electro-transfer. Western blots were 
developed using rabbit anti-BPTI serum, which had previ- 
ously been incubated with- an E. -coli extract, followed by 
goat ant-rabbit antibody conjugated to alkaline phospha- 
tase. An immunoreactive protein of 67 Kd is detected in 
preparations of the MK-BPTI but not the MK phage. The 
size of the immunoreactive protein is consistent with the 
predicted size of a processed BPTI-III fusion protein (6.4 
Kd plus 60 Kd) . These data indicate that BPTI-specif ic 
epitopes are presented on the surface of the MK-BPTI phage 
but not the MK phage. 

NEDTRAIiIZATION OF PHAGE TITER WITH AGAROSE— IMMOBILIZED 
ANHYDRO-TRYPS IN 
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Anhydro-trypsin is a derivative of trypsin in which 
the active site serine has been converted to dehydroalan- 
ine. Anhydro-trypsin retains the specific binding of 
trypsin but not the protease activity. Unlike polyclonal- 
antibodies, anhydro-trypsin is not expected to bind 
unfolded BPTI or incomplete fragments. 

Phage MK-BPTI and MK were diluted to a concentration 
1.4»10 12 particles per ml. in TBS buffer (PARM88) contain- 
ing 1.0 mg/ml BSA. Thirty microliters of diluted phage 
were added to 2, 5, or 10 microliters of a 50% slurry of 
agarose-immobilized anhydro-trypsin (Pierce Chemical Co. , 
Rockford, IL) in TBS/ BSA buffer. Following incubation at 
25 °C, aliquots were removed, diluted in ice cold LB broth 
and titered for plaque-forming units on a lawn of XL1- 
Blue(™) cells. Table 114 illustrates that incubation of 
the MK-BPTI phage wihh immobilized anhydro-trypsin results 
in a very significant loss in titer ^/&r a four hour 
period while no such effect is observed with the Ml' 
(control) phage. The reduction in phage titer is also 
proportional to th&. amount of immobilized anhydro-trypsin 
added to the MK-BPTI phage. Incubation with five microli- 
ters of a 50% slurry of agarose-immobilized streptavidin 
(Sigma, St. Louis, MO) in TBS/BSA buffer does not reduce 
the titer of either the MK-BPTI or MK phage. These data 
are consistent with the presentation of a correctly- 
folded, functional BPTI protein on the surface of the MK- 
BPTI phage but not on the MK phage. Unfolded or incom- 
plete BPTI domains are not expected to bind anhydro- 
trypsin. Furthermore, unfolded BPTI domains are expected 
to be non-specif ically sticky. 

NEUTRALIZATION OF PHAGE TITER WITH ANTI-BPTI ANTIBODY 

MK-BPTI and MK phage were diluted to a concentration 
of 4* 10 s plaque-forming units per ml in LB broth. 
Fifteen microliters of diluted phage were added to an 
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equivalent volume of either rabbit anti-BPTI serum or 
normal rabbit serum (both diluted 10 fold in 13 broth) . 
Following incubation at 37 °C, aliguots were removed, 
diluted by 10 4 in ice-cold LB broth and titered for 
5 plaque-forming units on a lawn of XLl-Blue 

(TM) 

cells. 

Incubation of the MK-BPTI phage with anti-BPTI serum 
results in a steady loss in titer over a two hour period 
while no such effect is observed with the MK phage. As 
expected, normal rabbit serum does not reduce the titer of 

10 either the MK-BPTI or the MK phage. Prior incubation of 
the anti-BPTI serum with authentic BPTI protein but not 
with an equivalent amount of coli protein, blocks the 
ability of the serum to reduce the titer of the MK-BPTI 
phage. This data is consistent with the presentation of 

15 BPTI-specif ic epitopes on the surface of the MK-BPTI phage 
but not the MK phage. More specifically, the data 
indicates that t^sse BPTI epitopes are associated with the 
r^n^i III protein an** tn?± ^sooiat^ on of thi*= -fusion 
protein with an anti-BPTI antibody blocks its ability to 

20 mediate the infection of bacterial cells. 

NEUTRALIZATION OF PHAGE TITER WITH TRYPSIN 

MK-BPTI and MK phage were diluted to a concentration 
of 4*10 8 plaque-forming units per ml in LB broth. 
Diluted phage were added to an equivalent volume of 

25 trypsin diluted to various concentrations in LB broth. 
Following incubation at 37 °C, aliquots were removed, 
diluted by 10 4 in ice cold LB broth and titered for 
plaque-forming units on a lawn of XLl-Blue (™) cells. 
Incubation of the MK-BPTI phage with 0.15 ng of trypsin 

30 results in a 70% loss in titer after a two hour period 
while only a 15% loss in titer is observed for the MK 
phage. A reduction in the amount of trypsin added to 
phage results in a reduction in the loss of titer. 
However, at all trypsin concentrations investigated , the 
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MK-BPTI phage are more sensitive to incubation with 
trypsin than the MK phage; An interpretation of this data 
is that association of the BPTI-III fusion protein 
displayed on the surface of the MK-BPTI phage with 
trypsin blocks its ability to mediate the infection of 
bacterial cells. 

The reduction in titer of phage MK by trypsin is an 
example of a phenomenon that is likely to be general: 
proteases, if present in sufficient quantity, will degrade 
proteins on the phage and reduce infectivity. The present 
application lists several means that can be used to 
overcome this problem. 

AFFINITY SELECTION SYSTEM 

Affinity Selection with Immobilized Anhydro-Trypsin 

MK-BPTI and MK phage were diluted to a concentration 
of 1.4 '10 12 particles per ml in TBS buffer (PAPM88; 
containing l.C Kg/iui B£>A. We added 4. 0*10 10 phage to 5 
microliters of a 50% slurry of either agarose-immobilized 
anhydro- trypsin beads (Pierce Chemical Co.) or agarose- 
immobilized streptavidin beads (Sigma) in TBS/BSA. 
Following a 3 hour incubation at room temperature, the 
beads were pelleted by centrifugation for 30 seconds at 
5000 rpm in a microfuge and the supernatant fraction was 
collected. The beads were washed 5 times with TBS/Tween 
buffer (PAKM88) and after each wash the beads were 
pelleted by centrifugation and the supernatant was 
removed. Finally, beads were resuspended in elution 
buffer (0.1 N HCl containing 1.0 mg/ml BSA adjusted to pH 
2.2 with glycine) and following a 5 minute incubation at 
room temperature, the beads were pelleted by centrifuga- 
tion • The supernatant was removed and neutralized by the 
addition of 1.0 M Tris-HCl buffer, pH 8.0. 

Aliquots of phage samples were applied to a Nytran 
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membrane using a Schleicher and Schuell (Keene, NH) 
filtration minifold and phage DNA was immobilized onto the 
Nytran by baking at 80 °C for 2 hours. The baked filter 
was incubated at 42 *C for 1 hour in pre-wash solution 
5 (MANI82) and pre-hybridization solution (5Prime-3Prime, 
West Chester, PA). The 1.0 Kb Narl (base 1630) / Xmn I (base 
2646) DNA fragment from MK RF was radioactively labelled 
with 32 P-dCTP using an oligolabelling kit (Pharmacia, 
Piscataway, NJ) . The radioactive probe was added to the 
10 Nytran filter in hybridization solution (5Prime-3Prime) 
and, following overnight incubation at 42°C / the filter 
was washed and subjected to autoradiography. 

The efficiency of this affinity selection system can 
be semi-quantitatively determined using the dot-blot 

15 procedure described elsewhere in the present application. 
Exposure of MK-BPTI-phage-treated anhydro-trypsin beads to 
elution buffer releases bound MK-BPTI phage. StreptavMin 
beads do not retain phage .MK-BPTI . Anhy d r o - 1 ry p sin beads 
do not retain phage MK. In the experiment depicted in 

20 Table 115, we estimate that 20% of the total MK-BPTI phage 
were bound to 5 microliters of the immobilized anhydro- 
trypsin and were subsequently recovered by washing the 
beads with elution buffer (pH 2.2 HCl/glycine) . ,Under the 
same conditions, no detectable MK-BPTI phage were bound 

25 and subsequently recovered from the streptavidin beads. 
The amount of MK-BPTI phage recovered in the elution 
fraction is proportional to the amount of immobilized 
anhydro-trypsin added to the phage. No detectable MK 
phage were bound to either the immobilized anhydro- 

30 trypsin or streptavidin beads and no phage were recovered 
with elution buffer. These data indicate that the 
affinity selection system described above can be utilized 
to select for phage displaying a specific folded protein 
(in this case, BPTI) . Unfolded or incomplete BPTI domains 

35 are not expected to bind anhydro-trypsin. 
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Affinity Selection with Anti-BPTI antibodies 

MK-BPTI and MK phage were diluted to a concentration 
of 1-10 10 particles per ml in Tris buffered saline 
solution (PARM88) containing 1.0 mg/ml BSA. Two»10 8 
phage were added to 2.5 jug of either biotinylated rabbit 
anti-BPTI IgG in TBS/BSA or biotinylated rabbit anti-mouse 
antibody IgG (Sigma) in TBS/BSA, and incubated overnight 
at 4°C. A 50% slurry of streptavidin-agarose (Sigma), 
washed three times with TBS buffer prior to incubation 
with 30 mg/ml BSA in TBS buffer for 60 minutes at room 
temperature, was washed three times with TBS/Tween buffer 
(PARM88) and resuspended to a final concentration of 50% 
in this buffer. Samples containing phage and biotinylated 
IgG were diluted with TBS/Tween prior to the addition of 
streptavidin-agarose in TBS/Tween buffer. Following a 60 
minute incubation at room temperature, streptavidin- 
-agarose beads were pelleted by centrifugation for 30 
seconds and the supernatant fraction Was collected. The 
beads were washed 5 times with TBS/Tween buffer and after 
each wash, the beads were pelleted by centrifugation and 
the supernatant was removed. Finally, the streptavidin- 
-agarose beads were resuspended in elution buffer (0.1 N 
HC1 containing 1.0 mg/ml BSA adjusted to pH 2.2 with 
glycine) , incubated 5 minute at room temperature, and 
pelleted by centrifugation. The supernatant was removed 
and neutralized by the addition of 1.0 M Tris-HCl buffer, 
pH 8.0. 

Aliquots of phage samples were applied to a Nytran 
membrane using a Schleicker and Schuell minifold appar- 
atus. Phage DNA was immobilized onto the Nytran by baking 
at 80 °C for 2 hours. Filters were washed for 60 minutes 
in pre-wash solution (MANI82) at 42 9 C then incubated at 
42 °C for 60 minutes in Southern pre-hybridization solution 
(5Prime-3Prime) . The 1.0 Kb Narl (1630bp) /Xmxa (2646 bp) 
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DNA fragment from MK RF was radioactively labelled with 
32 P-adCTP using an oligolabelling kit (Pharmacia, Piscat- 
away, NJ) ♦ Nytran membranes were transferred from 
pre-hybridization solution to Southern hybridization 
solution (5Prime-3Prime) at 42 °C. The radioactive probe 
was added to the hybridization solution and following 
overnight incubation at 42 °C, the filter was washed 3 
times with 2 x SSC, 0.1% SDS at room temperature and once 
at 65 °C in 2 x SSC, 0.1% SDS. Nytran membranes were 
subjected to autoradiography. The efficiency of the 
affinity selection system can be semi-guantitatively 
determined using the above dot blot procedure. Comparison 
of dots Al and Bl or CI and Dl indicates that the majority 
of phage did not stick to the streptavidin-agarose beads. 
Washing with TBS/Tween buffer removes the majority of 
phage which are non-specifically associated with strept- 
avidin beads. Exposure of the streptavidin beads to 
elution buffer releases bound phage only in the case of 
MK-BPTI phage which have previously been incubated with 
biotinylated rabbit anti-BPTI IgG. This data indicates 
that the affinity selection system described above can be 
utilized to select for phage displaying a specific antigen 
(in this case BPTI) . We estimate an enrichment factor of 
at least 40 fold based on the calculation 

Percent MK-BPTI phage recovered 

Enrichment Factor = 

Percent MK phage recovered 

EXAMPLE III 

CHARACTERIZATION AND FRACTIONATION OF CLONALLY PORE 
POPULATIONS OF PHAGE, EACH DISPLAYING A SINGLE CHIMERIC 
APROTININ H0M0L0GUE/M13 GENE III PROTEIN: 

This Example demonstrates that chimeric phage 
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DNA fragment from MK RF was radioactively labelled with 
32 P-adCTP using an oligolabelling kit (Pharmacia, Piscat- 
away, NJ) . Nytran membranes were transferred from 
pre-hybridization solution to Southern hybridization 
5 solution (5Prime-3Prime) at 42*0, The radioactive probe 
was added to the hybridization solution and following 
overnight incubation at 42 9 C, the filter was washed 3 
times with 2 x SSC, 0.1% SDS at room temperature and once 
at 65 °C in 2 x SSC, 0,1% SDS. Nytran membranes were 

10 subjected to autoradiography. The efficiency of the 
affinity selection system can be semi-quant itatively 
determined using the above dot blot procedure. Comparison 
of dots Al and Bl or CI and Dl indicates that the majority 
of phage did not stick to the streptavidin-agarose beads. 

15 Washing with TBS/Tween buffer removes the majority of 
phage which are non-specif ically associated with strept- 
avidin beads. Exposure of the streptavidin beads to 
elution buffer releases bound phage only in the case of 
MK-BPTI phage which have previously been incubated with 

20 biotinylated rabbit anti-BPTI IgG. This data indicates 
that the affinity selection system described above can be 
utilized to select for phage displaying a specific antigen 
(in this case BPTI) . We estimate an enrichment factor of 
at least 40 fold based on the calculation 
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Percent MK-BPTI phage recovered 
Enrichment Factor = 



Percent MK phage recovered 

30 

EXAMPLE III 

CHARACTER! Z ATXON AND FRACTIONATION OF CLONALLY PORE 
POPULATIONS OF PHAGE, EACH DISPLAYING A SINGLE CHIMERIC 
APROTININ HOMOLOGUE/M1 3 GENE III PROTEIN: 
35 This Example demonstrates that chimeric phage 
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superimposed on the corresponding a carbons of trypsin, 
rms deviation «0.5 A.) Inspection of this model indicates 
that TRP39 could interact with the loop of HNE that 
comprises VAL99/ ASN9g a , and LEUggfc,. HIS is observed in 
5 six cases; HIS is hydrophobic, aromatic, and in some ways 
similar to TRP. LEU39 in EpiNE7.5 could also interact 
with these residues if the loop moves a short distance. 
GLU occurred twice while LYS, ARG, and GLN occurred once 
each. In BPTI, the C a of residue 39 is «10 A from the C a 

10 of residue 15 so that TRP39 interacts with different 
features of HNE than do the amino acids substituted at 
position 15. Residue 34 is well separated from each of 
the residues 15, 18, and 39; thus it contacts different 
features on the HNE surface from these residues. Although 

15 serine proteases are highly similar near the catalytic 
site, the similarity diminishes rapidly outside this 
conserved region. The specificity of serine proteases is 
in fact determined by more interactions than the PI 
residue. To make an inhibitor that is highly specific to 

20 HNE, we must go beyond matching the requirement at PI. 
Thus, the substitutions at 18 (determined in Example IV), 
39, 34, and other non-Pi positions are invaluable in 
customizing the EpiNE to HNE. When making an inhibitor 
customized to a different serine protease, it is likely 

25 that many, if not all, of these positions will be changed 
to obtain high affinity and specificity. It is a major 
advantage of the present method that many such derivatives 
may be tested rapidly. 

At position 34, all 20 amino acids were allowed. 

30 Fourteen have been seen. LYS appeared seven times, GLU 
five times, THR four times, LEU three times, GLY, ASP, 
GLN, MET, ASN, and HIS twice each, and ARG, PRO, VAL, and 
TYR once each. There were no instances of ALA, CYS, PHE, 
ILE, SER, or TRP. No homologue of aprotinin with GLU, 

35 GLY, or MET at 34 has been reported heretofore. Here, as 
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at position 39, the library contains an excess of LEU over 
LYS and GLU. Thus, we infer that the prevalence of LYS, 
GLU, THR, and LEU is related to tighter binding of EpiNEs 
having these amino acids at position 34* The prevalence 
5 of LYS is surprising, as there are no acidic groups on HNE 
in the neighborhood. The N ze ^ a of LYS34 could interact 
with a main-chain carbonyl oxygen while the methylene 
groups interact with ILE ^^ and/or PHE 19 2» LEU34 could 
interact with ILE 151 and/or PHE192 while GLU34 could 
10 interact with AR£i47» 

There has been little if any enrichment at positions 
40 and 41. Alanine is somewhat preferred at 40; 
ALA:GLY: : 19 : 16, Both ALA and GLY have been reported in 
aprotinin homologues. 

15 Position 41 shows a preponderance of LYS (12 occur- 

rences) and GLU (7) , but all eight possibilities have 
been seen. The overall distribution is LYS 12 , GLU 7 , ASP 4 , 
ASN 4 , GLN 3 , HIS 3 , and TYR 2 . Heretofore, no homologues of 
aprotinin having GLU, GLN, HIS, or TYR at position 41 have 

20 been reported ♦ 

One sequence, EpiNE7.25 contains an unexpected change 
at position 47, SER to LEU, . Heretofore,, all homologues of 
aprotinin reported have had either SER or THR at position 
47. The side groups of SER and THR can form hydrogen 
25 bonds to main-chain atoms at the beginning of the short a 
helix. 

The consensus sequence, LYS34, GLY 36 , TRP39, ALA 40 , 
LYS 41 was not observed. EpiNE7.23 is quite close, 
differing only at position 40 where the preference for ALA 
30 is very, very weak. 

We tested EpiNE7.23 (the sequence closest to consen- 
sus) against EpiNE7 on HNE beads. Figure 16 shows the 
fractionation of strains of phage that display these two 
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EpiNEs. Phage that display EpiNE7 are eluted at higher pH 
than are phage that display EpiNE7.23. Furthermore, more 
of the EpiNE7.23 phage are retained than of the EpiNE7 
phage. Note the peak at pH 2.25 in the EpiNE7.23 elution. 
This suggests that EpiNE7.23 has a higher affinity for HNE 
than does EpiNE7. In a similar way, we tested EpiNE7.4 
and found that it is not retained on HNE so well as 
EpiNE7 . This is consistent with the fractionation not 
being complete • 

Further fractionation, characterization of clonally 
pure EpiNE7.nn strains, and biochemical characterization 
of soluble EpiNE7.nn derivatives will reveal which 
sequences in this collection have the highest affinity for 
HNE. 

Fractionation of the library involves a number of 
factors. Differential binding allows phage that display 
PBDs having the desired binding properties to be enriched. 
Differences in infectivity, plague size, and phage yield 
are related to differences in the sequence of the PBDs, 
but are not directly correlated to affinity for the 
target. These factors may reduce the effectiveness of the 
desired fractionation. An additional factor that may be 
present is differential abundance of PBD sequences in the 
initial library. One step we employ to reduce the effect 
of differential infectivity is to transduce cells with 
isolated phage rather than to infect them. In the first 
fractionation, we did not obtain sufficient material for 
transduction and so infected cells; this fractionation was 
successful. Because the parental sequence, EpiNE7, was 
selected for a sequence at residues 15 through 19 that 
confer high affinity for HNE, we believe that many, if not 
most, members of the KLMUT population have significant 
affinity for HNE. Thus the present fractionations must 
separate variants having very high affinity for HNE from 
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aprotinin homologue to 'HNE. Different substitutions at 
these positions is likely to confer different specificity 
on those derivatives. One of the major advantages of the 
present invention is that many substitutions at several 
locations may be tested with an amount of effort not much 
greater than is required to test a single derivative by 
previously used methods. 

There exist a number of proteases produced by 
lymphocytes. Neutrophil elastase is not the only lympho- 
cytic protease that degrades elastin. The protease p2 9 is 
related to HNE. Screening the MYMUT and KLMUT libraries 
against immobilized p29 is likely to allow isolation of an 
aprotinin derivative having high affinity for p29. 

EXAMPLE VII 

BPTI:VIII BOUNDARY EXTENSIONS. 

The aim of this work was to introduce peptide 
extensions between the C-terminus of the BPTI domain and 
the N-terminus of the M13 major coat protein within the 
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those merely having high affinity for HNE. It is perhaps 
relevant that BPTI-III MK phage are only partially eluted 
from immobilized trypsin at pH 2.2*; (trypsin, BPTI) = 
6.0»10~ 14 M. Elution of EpiNE7-III MA phage from immobil- 
ized HNE gives a peak at about pH 3,5 with some phage 
appearing at lower pH; (HNE , EpiNE7 ) < 1,*10 -11 M. We 
recycled phage that either were eluted at pH 2.0 or that 
were retained after elution with pH 2.0 buffer. A large 
percentage of EpiNE7-III MA phage would have been , washed 
away with the fractions at pHs less acid than 2.0. This, 
together with the marked preferences at positons 39 , 36, 
and 34, strongly sugestes that we have successfully 
fractionated the KLMUT library on the basis of affinity 
for HNE and that the EpiNE7.nn proteins have higher 
affinity for HNE than does EpiNE7 or any other reported 
aprotinin derivative. 

Fractionation in a few stringent steps emphasizes the 
affinity of the PBD and allows isolation of variants that 
confer a small-plaque phenotype on cells (through low 
infectivity or by slowing cell growth) . More gradual 
fractionation allows observation of a wider variety of 
variants that show high affinity and favors sequences that 
start at low abundance. Gradual fractionation also favors 
selection of variants that do not confer a small-plaque 
phenotype; such variants may be easier to work with and 
are preferred for some purposes. In either case, it is 
preferred to fractionate until there is a manageable 
number of distinct isolates and to characterize these 
isolates as pure clones. Thus, it is desirable, in most 
cases, to fractionate a library in more than one way. 

None have identified positions 39 and 34 as key in 
determining the affinity and specificity of aprotinin 
homologues and derivatives for particular serine prote- 
ases. None have suggested the tryptophan at 39 or charged 
amino acids (LYS or GLU) at 34 will enhance binding of an 



279 

contained new sequences at this position, 

A pool of phages, * containing the novel interface 
pentapeptide extensions, was collected by combining the 
phage extracted from the plated plaques. 

2. Adding multiple unit extensions to the fusion protein 
interface. 

The M13 gene III product contains 'stalk-like* 
regions as implied by electron micrographic visualization 
of the bacteriophage (LOPE85) . The predicted amino acid 
sequence of this protein contains repeating motifs, which - 
include : 

glu.gly.gly.gly.ser (EGGGS) seven times 
gly.gly.gly.ser (GGGS) three times 
glu.gly.gly.gly.thr (EGGGT) once. 

The aim of this section was to insert, at the domain 
interface, multiple unit extensions which would mirror the 
repeating motifs observed in the III gene product. 

Two synthetic oligonucleotides were designed and 

custom synthesized. GLY is encoded by four codons (GGN) ; 

when translated in the opposite direction, these codons 

give rise to THR, PRO, ALA, and SER. The third base of 

these codons was picked so that translation of the 

oligonucleotide in the opposite direction would encode 

SER. When annealed the synthetic oligonucleotides give 

the following unit duplex sequence (an EGGGS linker) : 

EGGGS 
5' C . GAG * GGA ♦ GGA * GGA . TC 3» 
3 1 TC.CCT.CCT.CCT.AGG.C 5 ( 
(L) (S) (S) (S) (G) 

The duplex has a common two base pair 5 f overhang 
(GC) at either end of the linker which allows for both the 
ligation of multiple units and the ability to clone into 
the unique Narl recognition sequence present in OCVs 
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fusion protein. The reasons for this were two fold; 
firstly to alter potential protease cleavage sites at the 
interdomain boundary (as evidenced by an apparent insta- 
bility of the fusion protein) and secondly to increase 
5 interdomain flexibility. 

1) Insertion of a variegated pentapeptide at the 

BPTI:VIII interface. 

The gene shown in Table 113 was modified by insertion 
of five RVT codons between codon 81 and 82* Two synthetic 
oligonucleotides were designed and custom synthesized. 
The first consisted of, from 5 1 to 3 1 : a) from base 2 of 
codon 77 to the end of codon 81, b) five copies of RVT, 
and c) from codon 82 to the second base of codon 94. The 
second comprised 20 bases complementary to the 3 1 end of 
the first oligonucleotide. Each RVT codon allows one of 
the amino acids [T, N, S, A, D, and G] to be encoded. 
This variegation codon was picked because: a) each amino 
acid occurs once, and b) all these amino acids are thought 
to foster a flexible linker. When annealed, the primed 
variegated oligonucleotide was converted to double- 
stranded DNA using standard methods. 

The duplex was digested with restriction enzymes Sfil 
and Narl and the resulting 45 base-pair fragment was 
25 ligated into a similarly cleaved OCV, M13MB48 (Example 
I.l.iii.a). The ligated material was transfected into 
competent E^. coli cells (strain XLl-Blue^™)) and plated 
onto a lawn of the same cells on normal bacterial growth 
plates to form plaques. The bacteriophage contained 
30 within the plaques were analyzed using standard methods of 
nitrocellulose lifts and probing using a 32 P-labeled 
oligonucleotide complementary to the DNA sequence encoding 
the fusion protein interface. Approximately 80% of the 
plaques probed poorly with this oligonucleotide and hence 
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M13MB48 and Gem MB42. This site is positioned within 1 
codon of the DNA encoding the interface. The cloning of 
an EGGGS linker (or multiple linker) into the vector Narl 
site destroys this recognition sequence. Insertion of the 
5 EGGGS linker in reverse orientation leads to insertion of 
GSSSL into the fusion protein. 

Addition of a single EGGGS linker at the Nar l site of 
the gene shown in Table 113 leads to the following gene: 

10 79 80 80a 80b 80c 80d 80e 81 82 83 84 

GGEGGGSAAEG 

GGT . GGC . GAG . GGA . GGA . GGA . TCC . GCC . GCT . GAA . GGT 



Note that there is no preselection for the orienta- 
tion of the linker (s) inserted into the OCV and that 
multiple linkers of either orientation (with the predicted 
EGGGS or GSSSL amino acid sequence) or a mixture of 
20 orientations (inverted repeats of DNA) could occur. 

A ladder of increasingly large multiple linkers was 
established by annealing and ligating the two starting 
oligonucleotides containing different proportions of 5 1 
phosphorylated and non-phosphorylated ends. The logic 

25 behind this is that ligation proceeds from the 3 1 unphos- 
phorylated end of an oligonucleotide to the 5 f phosphor- 
ylated end of another. The use of a mixture of phosphor- 
ylated and non-phosphorylated oligonucleotides allows for 
an element of control over the extent of multiple linker 

30 formation. A ladder showing a range of insert sizes was 
readily detected by agarose gel electrophoresis spanning 
15 bp (1 unit duplex-5 amino acids) to greater than 600 
base pairs (40 ligated linkers-200 amino acids) . 

Large inverted repeats can lead to genetic insta- 
35 bility. Thus we chose to remove them, prior to ligation 
into the OCV, by digesting the population of multiple 
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linkers with the restriction enzymes AccIII or Xho l, since 
the linkers, when ligated 1 head- to-head 1 or 1 tail -to — 
tail 1 , generate these recognition sequences. Such a 
digestion significantly reduces the range in sizes of the 
5 multiple linkers to between 1 and 8 linker units ( i.e. 
between 5 and 40 amino acids in steps of 5) , as assessed 
by agarose gel electrophoresis. 

The linkers were ligated (as a pool of different 
insert sizes or as gel-purified discrete fragments) into 

10 Narl cleaved OCVs M13MB48 or GemMB42 using standard 
methods. Following ligation the restriction enzyme Narl 
was added to remove the self-ligating starting OCV (since 
linker insertion destroys the Nar l recognition sequence) . 
This mixture was used to transform competent XL-1 blue 

15 cells and appropriately plated for plaques (OCV M13MB48) 
or ampicillin resistant colonies (OCV GemMB4 2 ) . 

The trans formants were screened using dot blot DNA 
analysis with one of two 32 P labeled oligonucleotide 
probes. One probe consisted of a sequence complementary 

20 to the DNA encoding the PI loop of BPTI while the second 
had a sequence complementary to the DNA encoding the 
domain interface region. Suitable linker candidates would 
probe positively with the first probe and negatively or 
poorly with the second. Plaque purified clones were used 

25 to generate phage stocks for binding analyses and BPTI 
display while the Rf DNA derived from phage infected 
bacterial cells was used for restriction enzyme analysis 
and sequencing. Representative insert sequences of 
selected clones analyzed are as follows: 

30 

M13 . 3X4 
M13 . 3X7 

35 

M13.3X11 



(GG) C . GGA . TCC . TCC . TCC . CT ( C . GCC) 
gly ser ser ser leu 

(G C. GAG. GGA. GGA. GGA. TC(C. GCC) 
glu gly gly gly ser 

(GG)C. GAG. GGA. GGA. GGA. TCC. GGA. TCC. TCC. 
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glu gly gly gly ser gly ser ser 

TCC . CTC . GGA . TCC . TCC . TCC . CT ( C . GCCC ) 
ser leu gly ser ser ser leu 

These highly flexible oligomeric linkers are believed to 
be useful in joining a binding domain to the major coat 
(gene VIII) protein of filamentous phage to facilitate the 
display of the binding domain on the phage surface. They 
may also be useful in the construction of chimeric OSPs 
for other genetic packages as well, 

EXAMPLE VIII 
BACTERIAL EXPRESSION VECTORS. 

The expression vectors were designed for the bac- 
terial production of BPTI analogues resulting from the 
mutagenesis and screening for variants with specific 
binding properties. The expression vectors used are 
derivatives of the OCV's M13MB48 and GemMB4 2 . The 
conversion was achieved by replacing the first codon of 
the mature VIII gene (codon 82 as shown in Table 113) with 
a translational stop codon by site specific mutagenesis. 

The salient points of the expression vector composi- 
tion are identical to that of the parent OCV's, namely a 
lacUVS promoter (hence IPTG induction) , ribosome binding 
site, initiating methionine, pho A signal peptide and 
transcriptional termination signal (see Table 113) . The 
placement of the stop codon allows for the expression of 
only the first half the fusion protein. The Gem-based 
expression system, containing the genes encoding BPTI 
analogues, is stored as plasmid DNA, being freshly 
transfected into cells for expression of the analogue 
protein. The M13-based expression system is stored as 
both RF DNA and as phage stocks. The phage stocks are 
used to infect fresh bacterial cells for expression of the 
protein of interest. 
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Bacterial Expression of BPTI and Analogues. 

i* Gem-based expression vector and protocol* 

The gem-based expression vector is a derivative of 
the OCV GexnMB42 (Eample I and Table 113) . This vector, at 
5 least when it contains the BPTI or analogue genes, has 
demonstrated a degree of insert instability on prolonged 
growth in liquid culture. To reduce the risk of this the 
following protocol is used. 

Expression vector DNA (containing the BPTI or 
10 analogue gene)" is trans fected into the E± coli strain, 
r) XL1-Blue(™), which is plated on bacterial plates contain- 

Cl ing ampicillin and allowed to incubate overnight at 37 *C 

*Jf to give a dense population of colonies. The colonies are 

fjl scraped from the plate with a glass spreader in 1ml of 

§1 15 NZCYM medium and combined with the scraped cells from 
other duplicate plates* This stock of cells is diluted 
^" approximately one hundred fold into NZCYM liquid medium 

Q containing ampicillin (lOOjug per ml) and allowed to grow 

01 in a shaking incubator to a cell density of approximately 

20 half log (absorbance of 0,3 at 600nm) . IPTG is added to a 
final concentration of 0*5 mM and the induced culture 
U allowed to grow for a further two hours when it is 

processed a^ described below. 

ii, M13-based expression vector and protocol* 

25 The M13 -based expression vector is derived from OCV 

M13MB48 (Example I) «, The BPTI gene (or analogue) is 
contained within the intergenic region and its transcrip- 
tion is under the control of a lacUV5 promoter, hence IPTG 
inducible. The expression vector, containing the gene of 

30 interest, is maintained and utilized as a phage stock. 
This method enables a potentially lethal or deleterious 
gene to be supplied to a bacterial culture and gene 
induction to occur only when the bacterial culture has 
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Regardless of the reason, the error rate is extremely low 
considering only 1 unexpected alteration was observed 
after sequencing 20 codons in 19 different clones. 
Furthermore, the value of such a mutation is not diminis- 
hed by its accidental nature. 

Some of the EpiNE clones are identical. The sequen- 
ces of EpiNE 1, EpiNE 3 , and EpiNE7 appear a total of 4, 6 
and 5 times respectively. Assuming the 1745 potentially 
different DNA sequences encoded by the MYMUT oligonucle- 
otide were present at equal frequency in the fusion phage 
library, the frequent appearance of the sequences for 
clones EpiNEl, EpiNE 3 , and EpiNE7 may have important 
implications. EpiNEl, EpiNE3, and EpiNE7 fusion phage may 
display BPTI variants with the highest affinity for HNE of 
all the 1000 potentially different BPTI variants in the 
MYMUT library. 

An examination of the sequences of the EpiNE clones 
is illuminating. A strong preference for either VAL or 
ILE at the PI position (residue 15) is indicated with VAL 
being favored over ILE by 14 to 6. In the MYMUT library, 
VAL at position 15 is approximately twice as prevalent as 
ILE. No examples of LEU, PHE, or MET at the PI position 
were observed although the MYMUT oligonucleotide has the 
potential to encode these residues at PI. This is 
consistent with the observation that BPTI variants with 
single amino acid substitutions of LEU, PHE, or MET for 
LYS 15 exhibit a significantly lower affinity for HNE than 
their counterparts containing either VAL or ILE (BECK88b) . 

PHE is strongly favored at position 17, appearing in 
12 of 20 codons. MET is the second most prominent residue 
at this position but it only appears when VAL is present 
at position 15. At position 18 PHE was observed in all 20 
clones sequenced even though the MYMUT oligonucleotide is 
capable of encoding other residues at this position. This 
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result is quite surprising and could not be predicted from 
previous mutational analysis of BPTI, model building, or 
on any theoretical grounds. We infer that the presence of 
PHE at position 18 significantly enhances the ability each 
5 of the EpiNEs to bind to HNE. Finally at position 19, PRO 
appears in 10 of 20 codons while SER, the second most 
prominent residue, appears at 6 of 20 codons. Of the 
residues targeted for mutagenesis in the present study, 
residue 19 is the nearest to the edge of the interaction 

10 surface of a PEPI with HNE, Nevertheless, a preponderance 
of PRO is observed and may indicate that PRO at 19, like 
PHE at 18, enhances the binding of these proteins to HNE. 
Interestingly, EpiNES appears only once and differs from 
EpiNEl only at position 19; similarly, EpiNE6 differs from 

15 EpiNE3 only at position 19. These alterations may have 
only a minor effect on the ability of these proteins to 
interact with HNE. This is supported by the fact that the 
pH elution profiles for EpiNES and EpiNE6 are very similar 
to those of EpiNEl and EpiNE3 respectively. 

20 Only EpiNE2 and EpiNE8 exhibit pH profiles which 

differ from those of the other selected clones. Both 
clones contain LYS at position 19 which may restrict the 
interaction of BPTI with HNE. However, we can not exclude 
the possibility that other alterations within EpiNE2 and 

25 EpiNE8 (R15L and Y21S respectively) influence their 
affinity for HNE. 

EpiNE7 was expressed as a soluble protein and 
analyzed for HNE inhibition activity by the fluorometric 
assay of Castillo et al. (CAST79) ; the data were analyzed 
30 by the method of Green and Work (GREE53). Preliminary 
results indicate that K d (HNE,EpiNE7) < 8.»10~ 12 M, i.e. at 
least 7.5-fold lower than the lowest reported for a 
BPTI derivative with restect to HNE. 



C . Summary 
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Taken together, these data show that the alterations 
which appear in the PI region of the EPI mutants confer 
the ability to bind to HNE and hence be selected through 
the fractionation process. That the sequences of EpiNEl, 
5 EpiNE3 , and EpiNE7 appear frequently in the population of 
selected clones suggests that these clones display BPTI 
variants with the highest affinity for HNE of any of the 
1000 potentially different variants in the MYMUT library. 
Furthermore, that pH conditions less than 4.0 are required 

10 to elute these fusion phage from immobilized HNE suggests 
that they display BPTI variants having a higher affinity 
for HNE than BPTI(K15V,R17L) . EpiNE7 exhibits a lower Ka 
toward HNE than does BPTI (K15V / R17L) ; EpiNEl and EpiNE3 
should are also expected to exhibit lower K^s for HNE than 

15 BPTI(K15V,R17L) . It is possible that all of the listed 
EpiNEs have lower K d s than BPRI (K15V,R17L) . 

Position 18 has not previously been identified as a 
ke Y position in determining specificity or affinity of 
aprotinin homologues or derivatives for particular serine 
20 proteases. None have reported or suggested that phenyl- 
alanine at position 18 will confer specificity and high 
affinity for HNE. One of the powerful advantages of the 
present invention is that many diverse amino-acid sequen- 
ces may be tested simultaneously. 

25 

EXAMPLE V 

SCREENING OF THE MYMUT LIBRARY FOR BINDING TO CATHEPSIN G 
BEADS* 

We fractionated the MYMUT library over immobilized 
5 human Cathepsin G to find an engineered protease inhibitor 
having high affinity for Cathepsin G, hereafter designated 
as an Epic. The details of phage binding, elution of 
bound phage with buffers of decreasing pH (pH profile) , 
titering of the phage contained in these fractions, 
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composition of the MYMUT library, and the preparation of 
cathepsin G (Cat G) beads are essentially the same as 
detailed in Example IV „ 

A pH profile for the binding of two starting con~ 
5 trols, BPTI-III MK and EpiNEl, are shown in Figure 10 ♦ 
BPTI -III MK phage, which contains wild type BPTI fused to 
the III gene product, shows no apparent binding to Cat G 
beads in this assay. EpiNEl phage was obtained by — 
enrichment with HNE beads (Example IV and Table 208) ♦ 
10 EpiNEl-III MK demonstrated little binding to Cat G beads ^ 
in the assay, although a small peak or shoulder is visible 
in the pH 5 eluted fraction. 

Figure 11 shows the pH profiles of the MYMUT library 
phage when bound to Cat G beads, Library-Cat G interac- 

15 tion was monitored using three cycles of binding, pH 
elution, transduction of the pH 2 eluted phage, growth of 
the transduced phage and rebinding of any selected phage 
to Cat G beads, in an exact copy of that used to find 
variants of BPTI which bound to HNE* In contrast to the 

20 pH profiles elicited with HNE beads, little enhancement of 
binding was observed for the same phage library when 
cycled with Cat G beads (with the exception of a possible 
'shoulder* developing in the pH5 elutions) • 

To investigate the elution profile around the pH 5 
25 point in more detail, the binding of phage taken from the 
pH 4 eluted fraction (bound to Cat G beads) rather than 
the previously used pH 2 fraction was examined * Figure 12 
demonstrates a marked enhancement of phage binding to the 
Cat G beads with an apparent elution peak of pH 5* The 
30 binding, as a fraction of the input phage population, 
increased with subsequent binding and elution cycles. 

Individual phage clones were picked, grown and 
analyzed for binding to Cat G beads* Figure 13 shows the 
binding and pH profiles for the individual Cat G binding 
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clones (designated Epic variants) . All clones exhibited 
minor peaks, superimposed .upon a gradual fall in bound 
phage, at pH elutions of 5 (clones 1, 8, 10 and 11) or pH 
4.5 (clone 7) . 

5 DNA sequencing of the Epic clones, shown in Table 

209, demonstrated that the clones selected for binding to 
Cat 6 beads represented a distinct subset of the available 
sequences in the MYMUT library and a cluster of sequences 
different from that obtained when enriched with HNE beads. 

10 The PI residue in the Epic mutants is predominantly MET, 
with one example of PHE, while in BPTI it is LYS and in 
the EpiNE variants it is either VAL or LEU. In the Epic 
mutants residue 16 is predominantly ALA with one example 
of GLY and residue 17 is PHE, ILE or LEU. Interestingly 

15 residues 16 and 17 appear to pair off by complementary 
size, at least in this small sample. The small GLY 
residue pairs with the bulky PHE while the relatively 
larger ALA residue pairs with the less bulky LEU and ILE. 
The majority of the available residues in the MYMUT 

20 library for positions 18 and 19 are represented in the 
Epic variants. 

Hence, a distinct subset of related sequences from 
the MYMUT library have been selected for and demonstrated 
to bind to Cat G. A comparison of the pH profiles 

25 elicited for the Epic variants with Cat G and the EpiNE 
variants for HNE indicates that the EpiNE variants have a 
high affinity for HNE while the Epic variants have a 
moderate affinity for Cat G. Nonetheless, the starting 
molecule, BPTI, has virtually no detectable affinity for 

30 Cat G and the selection of clones with a moderate affinity 
is a significant finding. 

EXAMPLE VI 

SECOND ROUND OF VARIEGATION OF EpiNE7 TO ENHANCE BINDING 
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TO HNE 

A. MUTAGENESIS OF EpiNE7 PROTEIN IN THE LOOP COMPRISING 

RESIDUES 34-41 

In Example iv, we described engineered protease 
5 inhibitors EpiNEl through EpiNE8 that were obtained by 
affinity selection* Modeling of the structure of the 
BPTI -Trypsin complex (Brookhaven Protein Data Bank entry 
1TPA) indicates that the EpiNE protein surface that 
interacts with HNE is formed not only by residues 15-19 
10 but also by residues 34-40 that are brought close to this 
primary loop when the protein folds (HUBE74 , HUBE75, 
OAST88) . Acting upon this assumption, we changed amino 
acid residues in a second loop of the EpiNE7 protein to 
find EpiNE7 derivatives having higher affinity for HNE- 

15 In the complex of BPTI and trypsin found in Brook- 

haven Protein Data Bank entry 1TPA ("1TPA complex") , 
VAL34 contacts TYR 151 and GLN192- (Residues in trypsin or 
HNE are underscored to distinguish them from the inhib- 
itor*) In HNE, the corresponding residues are ILE 151 and 

20 PHE292* i s smaller and more hydrophobic than TYR* 

PHE is larger and more hydrophobic than GLN. Neither of 
the HNE side groups have the possibility to form hydrogen 
bonds* When side groups larger than that of VAL are 
substituted at position 34, interactions with residues 

25 other than 151 and 192 may be possible* In particular, an 
acidic residue at 34 might interact with ARG147 of HNE 
that corresponds to SER 147 of trypsin in 1TPA* Table 15 
shows that, in 59 homologues of BPTI, 13 different amino 
acids have been seen at position 34* Thus we allow all 

30 twenty amino acids at 34, 

Position 36 is not highly varied; only GLY, SER, and 
ARG have been observed with GLY by far the most prevalent. 
In the 1TPA complex, GLY35 contacts HIS57 and GLNi92- 
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HIS 57 is conserved and GLN3.92 corresponds to SH192 of 
HNE# Adding a methyl group to GLY 36 could increase 
hydrophobic interactions with PHE 192 of mE • GLY 36 * s * n 
a conformation that most amino acids can achieve: <p - -79° 
5 and ijr « -9° (Deisenhof fer cited in CREI84, p. 222.)* 

In the 1TPA complex, ARG39 contacts SER gg, ASN97, 
THR 93 f LEU99, GLN 175 , and TRP 215 . In HNE, all of the 
corresponding residues are different I SER 95 is deleted; 
ASN97 corresponds to ASP 97 (bearing a negative charge) ; 

10 THR 9Q corresponds to PRO 93; LEU99 corresponds to the 
residues VAL99/ ASN gga, and LEU gg^; GLN 275 is deleted; and 
TRP 2 15 corresponds to PHE215* Position 39 shows a 
moderately high degree of variability with 7 different 
amino acids observed, viz. ARG, GLY, LYS, GLN, ASP, PRO, 

15 and MET. Having seen PRO (the most rigid amino acid) , GLY 
(the most flexible amino acid) , LYS and ASP (basic and 
acidic amino acids) , we assume that all amino acids are 
structurally compatible with the aprotinin backbone. 
Because the context of residue 39 has changed so much, we 

20 allow all 20 amino acids. 

Position 40 is not highly variable; only GLY and ALA 
have been observed (with similar frequency, 24:16) • 
Position 41 is moderately varied, showing ASN, LYS, ASP, 
GLN, HIS, GLU, and TYR. The side groups of residues 40 

25 and 41 are not thought to contact trypsin in the 1TFA 
complex. Nevertheless, these residues can exert electro- 
static effects and can influence the dynamic properties of 
residues 39, 38, and others* The choice of residues 34, 
36, 39, 40, and 41 to be varied simultaneously illustrates 

30 the rule that the varied residues should be able to touch 
one molecule of the target material at one time or be able 
to influence residues that touch the target. These 
residues are not contiguous in sequence, nor are they 
contiguous on the surface of EpiNE7. They can, nonethe- 
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less, all influence the contacts between the EpiNE and 
HNE. 

Amino acid residues VAL 34 , GLY 36 , MET 39 , GLY 40 , and 
ASN 41 were variegated as follows: any of 20 genetically 
5 encodable amino acids at positions 34 and 39 (NNS codons 
in which N is approximately equimolar A,C,T,G and S is 
approximately equimolar C and G) , GLY or ALA at position 
36 and 40 (GST codon) , and [ASP, GLU, HIS, LYS, ASN, GLN, 
TYR, or stop] at position 41 (NAS codon) . Because the _ 
10 PEPIs are displayed fused to gill protein, DNA containing 
stop codons will not give rise to infectuous phage in non- 
suppressor hosts. 

yO For cassette mutagenesis, a 61 base long oligonucleo- 

*jf tide DNA population was synthesized that contained 32,768 

Jft 15 different DNA sequences coding on expression for a total 
§3 of 11,200 amino acid sequences. This oligonucleotide 

M extends from the third base of codon 51 in Table 113 (the 

m middle of the StuI site) to base 2 of codon 70 (the Eaa l 

p site (identified as Xmalll in Table 113))* 

^ 20 We used a mutagenesis method similar to that descri- 

■~ 

bed by Cwirla et al. (CWIR90) and other standard DNA 
CI manipulations -described in Maniatis et al» (MANI82) and 

^ Sambrook et al« (SAMB89) . EpiNE7 RF DNA was restricted 

with Eaa l and Stu I, agarose gel purified, and dephos- 
25 phorylated using HK^™) phosphatase (Epicentre Technol- 
ogies). We prepared insert by annealing two small, 16 
base and 17 base, phosphorylated synthetic DNA primers to 
the phosphorylated 61 base long oligonucleotide population 
described above. The resulting insert DNA population had 
30 the following features; double stranded DNA ends capable 
of regenerating upon ligation the Eaa l (5 1 overhang) and 
Stu I (blunt) restricted sites of the EpiNE 7 RF DNA, and 
single stranded DNA in the central mutagenic region. 
Insert and EpiNE7 vector DNA were ligated. Ligation 
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samples were used to transfect competent XL1 -Bluet™) 
cells which were subsequently plated for formation of 
ampicillin resistant (Ap R ) colonies* The resulting phage- 
producing, Ap R colonies were harvested and recombinant 
phage was isolated. By following these procedures, a 
phage library of 1.2*10 5 independent transformants was 
assembled. We estimated that 97.4% of the approximately 
3.3*10 4 possible DNA sequences were represented: 

0.974 = (1 - exp{-1.2*10 5 /32768}) 
The probability of observing the parental sequence is 
higher than .974 because VAL occurs twice in the NNS 
codon : 

Probability of seeing (V 34 , G 36/ 3*39, G 40 , N 41 ) = 
(1 - exp{ - (1.2'10 5 X 2/32768) } 
= (1 - exp{ - 7.32}) 
= (1 - 6.5«10~ 4 ) 
= 0,99934 

Furthermore, we expect that a small amount (for example, 1 
part in 1000) of uncut or once-cut and religated parental 
vector would come through the procedures used. Thus the 
parental sequence is almost certainly present in the 
library. This library is designated the KLMUT library. 

B. AFFINITY SELECTION WITH IMMOBILIZED HUMAN NEUTROPHIL 

ELASTASE 

1) First Fractionation 

We added l.l-io 8 plaque forming units of the KLMUT 
library to 10 jLtl of a 50% slurry of agarose-immobilized 
human neutrophil elastase beads (HNE from Calbiochem 
cross-linked to Reacti-Gel*™) agarose beads from Pierce 
Chemical Co* following manufacturer's directions) in 
TBS/BSA. Following 3 hours incubation at room tempera- 
ture, the beads were washed and phage was eluted as done 
in the selection of EpiNE phage isolates (Example IV) . 
The progression in lowering pH during the elution was: pH 



I 

270 

7.0, 6.0, 5*0, 4.5, 4.0, 3.5, 3*0, 2.5, and 2.0. Beads 
carrying phage remaining after pH 2.0 elution were used to 
infect XL1-Blue(™) cells that were plated to allow plaque 
formation. The 348 resulting plaques were pooled to form 
5 a phage population for further affinity selection. A 
population of phage particles containing 6.0*10^ plaque 
forming units was added to 10 /il of a 50% slurry of 
agarose- immobilized HNE beads in TBS/BSA and the above 
selection procedure was repeated. 

10 Following this second round of affinity selection, a 

portion of the beads was mixed with XL1-Blue(™) cells and 
plated to allow plaque formation. Of the resulting 
plaques, 480 were pooled to form a phage population for a 
third affinity selection. We repeated the selection 

15 procedure described above using a population of phage 
particles containing 3.0*10 9 plaque forming units. 
Portions of the pH 2.0 eluate and of the beads were plated 
with XL1-Blue(™) cells to allow formation of plaques. 
Individual plaques were picked for preparation of ~RF DNA. 

20 From DNA sequencing, we determined the amino acid sequence 
in the mutated secondary loop of 15 EpiNE7-homolog clones. 
The sequences are given in Table 210 as EpiNE7.1 through 
EpiNE7.20. Three sequences were observed twice: EpiNE7.4 
and EpiNE7.14; EpiNE7.8 and EpiNE7.9; and EpiNE7.10 and 

25 EpiNE7.20. EpiNE7.4 was eluted at pH 2 while EpiNE7.14 
was obtained by culturing HNE beads that had been washed 
with pH 2 buffer. Similarly, EpiNE7.l0 came from pH 2 
elution but EpiNE7.20 came from beads. EpiNE7.8 and 
EpiNE7.9 both came from pH 2 elution. Interestingly, 

30 EpiNE7.8 is found in both the first and second fractiona- 
tions (EpiNE7 .31 ( vide infra ) ) . 

2) Second Fractionation 

The purpose of affinity fractionation is to reduce 
diversity on the -basis of affinity for the target. The 
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first enrichment step of the first fractionation reduced 
the population from 3»10 4 possible DNA sequences to no 
more than 348. This might be too severe and some of the 
loss of diversity might not be related to affinity. Thus 
5 we carried out a second fractionation of the entire KLMUT 
library seeking to reduce the diversity more gradually* 

We added 2.0*10 1:L plaque forming units of the KLMUT 
library to 10 jxl of a 50% slurry of agarose-immobilized 
HNE beads in TBS/BSA. Following 3 hours incubation at 
10 room temperature, phage were eluted as described above* 
■- We then transduced XL1-Blue(™) cells with portions of the 
pH 2.0 eluate and plated for Ap R colonies* 

The resulting phage-producing colonies were harvested 
to obtain amplified phage for further affinity selection. 

15 A population of these phage particles containing 2* 0*10^^ 
plaque forming units was added to 10 /xl of a 50% slurry of 
agarose-immobilized HNE beads in TBS/BSA and incubated for 
90 minutes at room temperature. Phage were eluted as 
described above and portions of the pH 2.0 eluate were 

20 used to transduce XL1-Blue(™) cells. We plated the 
transductants for Ap R colonies and obtained amplified 
phage from the harvested colonies. 

In a third round of affinity selection , a population 
of phage particles containing 3.0- 10*0 plaque forming 

25 units was added to 20 jul of 50% slurry of agarose-immobil- 
ized HNE beads and incubated for 2 hours at room tempera- 
ture. We eluted the phage with the following pH washes: 
pH 7*0, 6.0, 5.0, 4.5, 4*0, 3.5, 3*25, 3.0, 2.75, 2.5, 
2.25, and 2*0. After plating a portion of the pH 2.0 

30 eluate fraction for plaque formation, we picked individual 
plaques for preparation of RF DNA. DNA sequencing yielded 
the amino acid sequence in the mutated secondary loop for 
20 EpiNE7 homolog clones* These sequences, together with 
EpiNE7, are given in Table 210 as EpiNE7.21 through 
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EpiNE7.40. The plaques observed when EpiNEs are plated 
display a variety of sizes, EpiNE7.21 through EpiNE7.30 
were picked with attention to plaque size: 7.21, 7.22, and 
7.23 from small plaques, 7.24 through 7.30 from plaques of 
increasing size, with 7.30 coming from a large plaque. 
TRP occurs at position 39 in EpiNE7.21, 7.22, 7.23, 7.25, 
and 7.30. Thus plaque size does not correlate with the 
appearance of TRP at 39. One sequence, EpiNE7.31, from 
this fractionation is identical to sequences EpiNE7.8 and 
EpiNE7.9 obtained in the first fractionation. EpiNE7.30, 
EpiNE7.34, and EpiNE7.35 are identical, indicating that 
the diversity of the library has been greatly reduced. It 
is believed that these sequences have an affinity for HNE 
that is at least comparable to that of EpiNE7 and probably 
higher. Because the parental EpiNE7 sequence did not 
recur, it is quite likely that some or all of the EpiNE7— 
.nn derivatives have higher affinity for HNE than does 
EpiNE7. 

3 ) Conclus ions 

One can draw some conclusions. First, because some 
sequences have been isolated repeatedly, the fractionation 
is nearly complete. The diversity has been reduced from 
>10 4 to a few tens of sequences. 

Second, the parental sequence has not recurred. At 
39, MET did not occur! At position 34 VAL occurred only 
once in 35 sequences. At 41, ASN occurred only 4 of 35 
times. At 40, GLY occurred 17 of 35 times. At position 
36, GLY occurred 34 of 35 times, indicating that ALA is 
undesirable here. EpiNE7.24 and EpiNE7.36 are most like 
EpiNE7, having three of the varied residues identical to 
EpiNE7 . 

Third, the results of the first and second fractiona- 
tion are similar. In the second fractionation, the 
prevalence of TRP at position 39 is more marked (5/15 in 



273 

fractionation #1, 14/20 in #2). It is possible that the 
first fractionation lost * some high-affinity EPIs through 
under-sampling. Nevertheless, the first fractionation was 
clearly quite successful. 

Fourth, there are strong preferences at positions 39 
and 36 and lesser but significant preferences at positions 
34 and 41 with little preference at 40. 

Heretofore, no homologues of aprotinin have been 
reported having ALA at 36. In the selected EpiNE7.nn 
sequences, the preference for GLY over ALA at position 36 
is 34:1. This preference is probably not due to differ- 
ences in protein stability. The process of the present 
invention, as applied in the present example, does not 
select against proteins on the basis of stability so long 
as the protein does fold and function at the temperature 
used in the procedure. ALA is probably tolerated at 
position 36 well enough to allow those proteins having 
ALA35 to fold and function; one example was found having 
ALA35. It may be relevant that the sole sequence having 
ALA35 also has GLY34. The flexibility of GLY at 34 may 
allow the methyl of ALA at 36 to fit into HNE in a way 
that is not possible when other amino acids occupy 
position 34. 

At position 39, all 20 amino acids were allowed, but 
only seven were seen. TRP is strongly preferred with 19 
occurrences, HIS second with six occurences, and LEU third 
with 5 occurrences. No homologues of aprotinin have been 
reported having either TRP or HIS at position 39 as are 
now disclosed. Although LEU is represented in the NNS 
codon thrice, TRP and HIS have but one codon each and 
their prevalence is surprising. We constructed a model 
having HNE (Brookhaven Protein Data Bank entry 1HNE) and 
EpiNE7.9 spatially related as in the 1TPA complex. (The a 
carbons of HNE of conserved internal residues were 
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that they might facilitate binding to HNE, it was not and 
is not possible to predict which combination of these 
amino acids will lead to high affinity for HNE. The 
mutagenic oligonucleotide MYMUT was synthesized by Genetic 
Design Inc. (Houston, Texas) . 

2) Construction of Library of Fusion Phage Displaying 

Potential Engineered Protease Inhibitors 

The single-stranded mutagenic MYMUT DNA was converted 
to the double stranded form with compatible Xhol and StuI 
ends and dephosphorylated with HK 

(TM) 

phosphatase as 

described above for the VAL1 oligonucleotide. BPTI(MGNG)- 
III MA Rf DNA was digested with Xho l and Stu I for 3 hours 
at 37 °C to ensure complete digestion. The 8.0 kb DNA 
fragment was purified by agarose gel electrophoresis and 
Ultrafree-MC unit filtration. One /zl of the dephosphoryl- 
ated MYMUT DNA (5 ng) was ligated to 50 ng of the 8.0 kb 
fragment derived from BPTI(MGNG) -III MA Rf DNA. Under 
these conditions, the 10:1 molar ratio of insert to vector 
was found to be optimal for the generation of transfor- 
mants. Ligation samples were extracted with phenol, 
phenol/chloroform/IAA (25:24:1, v:v:v) and chloroform/IAA 
-(24:1,- v:v) and DNA was -ethanol precipitated prior to 
electroporation. One pi of the recovered ligation DNA was 
added to 40 pi of electro-competent cells. Cells were 
shocked using a Bio-Rad Gene Pulser device as described 
above. Immediately following electroshock, 1.0 ml of SOC 
media was added to the cells which were allowed to recover 
at 37 °C for 60 minutes with shaking. The electroporated 
cells were plated onto LB plates containing Ap to permit 
the formation of colonies. 

To assess the efficiency of the cassette mutagenesis 
procedure, 39 transf ormants were picked at random and 
phage present in culture supernatant s were applied to a 
Nytran membrane and probed using the Dot Blot Procedure. 
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Two Nytran membranes were prepared in this manner. The 
first filter was allowed to hybridize to the CYSB oligo- 
nucleotide which had previously been radiolabelled. The 
second membrane was allowed to hybridize to the PRP1 
5 oligonucleotide which had also been radiolabelled. 
Filters were subjected to autoradiography following 
washing under high stringency conditions. Of the 39 phage 
samples applied to the membrane, all 39 hybridized to the 
CYSB probe. This indicated that there was fusion phage in 
10 the culture supernatants and that at least the DNA 
encoding residues 35-47 appeared to be present in the 
phage genomes. Only 11 of the 39 samples hybridized to 
fl the PRP1 oligonucleotide indicating that 28% of the 

yQ transformants were probably the parental phage BPTI (MGNG) - 

^ 15 III MA used to generate the library. The remaining 28 

'{Ti clones failed to hybridize to the PRP1 probe indicating 

|T| that substantial alterations were introduced into the PI 

N region by cassette mutagenesis using the MYMUT oligonucle- 

ffl 

otxde. Of these 28 samples, all were found to contain 
20 infectious phage indicating that mutagenesis did not 
01 result in frame shift mutations which would lead to the 

^ generation of defective gene III products and non-infec- 

S -tious i?hage. - {These 28 PEPI -displaying phage constitute a 

|,4 mini-library, the fractionation of which is discussed 

25 below.) Hence the overall efficiency of mutagenesis was 
estimated to be 72% in those cases where ligation DNA was 
not subjected to Apa l digestion prior to electroporation. 

Bacterial colonies were harvested by overlaying 
30 chilled LB plates containing Ap with 5 ml of ice cold LB 
broth and scraping off cells using a sterile glass rod. A 
total of 4899 transformants were harvested in this manner 
of which 3299 were obtained by electroporation of ligation 
samples which were not digested with Apal. Hence we 
35 estimate that 72% of these transformants ( i.e. 23 75) 
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represent mutants of the parental BPTI (MGNG) -III MA phage 
derived by cassette mutagenesis of the PI position. An 
additional 1600 transformants were obtained by electro- 
poration of ligation samples which had been digested with 
5 Apa l. If we assume that all of these clones contain new 
sequences at the PI position then the total number of 
mutants in the pool of 4899 transformants is estimated to 
be 2375 + 1600 = 3975* The total number of potentially 
different DNA sequences in the MYMUT library is 1728. We 
10 calculate that the library should display about 90% of the 
potential engineered protease inhibitor sequences as 
follows: 

displayed = N possible' (l-exp{-Libsize/N(DNA) }) 
15 - 1000- (1 - exp{-3975/1728}) = 900 

% of possible sequences displayed = 100* (900 1000) 

= 90% 
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3^ Fractionation of a Mini-Library of Fusion Phage 



We studied the fractionation of the mini library of 
28 PEPIs to establish the appropriate parameters for 

25 fractionation of the entire MYMUT PEPI library. We 
anticipated "that fractionation could be easier when the 
library of fusion phage was much less diverse than the 
entire MYMUT library. Fewer cycles of fractionation might 
be required to affinity purify a fusion phage exhibiting a 

30 high affinity for HNE. Secondly, since the sequences of 
all the fusion phage in the mini-library can be deter- 
mined, one can determine the probability of selecting a 
given fusion phage from the initial population. 

Two ml of the culture supernatants of the 28 PEPIs 
35 described above were pooled. Fusion phage were recovered, 
resuspended in 300 mM NaCl, 100 mM Tris, pH 8.0, 1 mM EDTA 
and stored on ice for 15 minutes. Insoluble material was 
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removed by centrifugation for 3 minutes in a microfuge at 
4°C. The supernatant fraction was collected and PEPI 
phage were precipitated with PEG-8000. The final phage 
pellet was resuspended in TBS/BSA. Aliquots of the 
recovered phage were titered for plaque- forming units on a 
lawn of cells. The final stock solution consisted of 200 
Ml of fusion phage at a concentration of 5.6^10 12 pfu/ml. 

a) First Enrichment Cycle 

Forty fxl of the above phage stock was added to 10 /xl 
of a 50% slurry of HNE beads in TBS/BSA . The sample was 
allowed to mix on a Labquake shaker for 1.5 hours* Five 
hundred /zl of TBS/BSA was added to the sample and after an 
additional 5 minutes of mixing, the HNE beads were 
collected by centrifugation. The supernatant fraction was 
removed and the beads were resuspended in 0.5 ml of 
TBS/0.5% Tween-20. Beads were washed for 5 minutes on the 
shaker and recovered by centrifugation as above. The 
supernatant fraction was removed and the beads were 
subjected to 4 additional washes with TBS/Tween-20 as 
described above to reduce non-specific binding of fusion 
phage to HNE beads. Beads were washed twice as above with 
0.5 ml of 50 mM sodium citrate pH 7.0, 150 mM NaCl 
containing 1.0 mg/ml BSA. The supernatant s from the two 
washes were pooled. Subsequently, the HNE beads were 
washed sequentially with a series of 50 mM sodium citrate, 
150 mM NaCl, 1.0 mg/ml BSA buffers of pH 6.0, 5.0, 4.5, 
4.0, 3.5, 3*0, 2.5 and 2.0. Two washes were performed at 
each pH and the supernatants were pooled and neutralized 
by the addition of 260 /il of 1 M Tris, pH 8.0. Aliquots 
of each pH fraction were diluted in LB broth and titered 
for plaque-forming units on a lawn of cells. The total 
amount of fusion phage (as judged by pfu) appearing in 
each pH wash fraction was determined. 

Figure 7 illustrates that the largest percentage of 
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input phage which bound to the HNE beads was recovered in 
the pH 5.0 fraction. The elution peak exhibits a trailing 
edge on the low pH side suggesting that a small proportion 
of the total bound fusion phage might elute from the HNE 
5 beads at a pH < 5. BPTI (K15L) -III phage display a BPTI 
variant with a moderate affinity for HNE (K^ = 2.9^10~ 9 M) 
(BECK88b) • Since BPTI (K15L) -III phage elute from HNE 
beads as a peak centered on pH 4*75 and the highest peak 
in the first passage of the mini-library over HNE beads is 
10 centered on pH 5.0, we infer that many members of the 
MYMUT PEPI minirlibrary display PEPIs having moderate to 
high affinity for HNE. - 

% To enrich for fusion phage displaying the highest 

m affinity for HNE, phage contained in the lowest pH 

tfl 15 fraction (pH 2.0) from the first enrichment cycle were 

amplified and subjected to a second round of fractiona- 
ry tion. Amplification involved the Transduction Procedure 
g;| described above. Fusion phage (2000 pfu) were incubated 
£ with 100 pi of cells for 15 minutes at 37°C in 200 pi of 1 
!S 20 X Minimal A salts. Two hundred (il of 2 X LB broth was 
n] added to the sample and cells were allowed to recover for 
J3 15 minutes at 37 °C with shaking. One hundred /il portions 
J^f of the above sample "were plated "onto LB plates containing 
Ap. Five such transduction reactions were performed 
25 yielding a total of 20 plates, each containing approxi- 
mately 350 colonies (7000 transformants in total) . 
Bacterial cells were harvested as described for the 
preparation of the MYMUT library and fusion phage were 
collected as described for the preparation of the mini- 
30 library. A total of 200 jul of fusion phage (4.3 *10 12 
pfu/ml in TBS/BSA) derived from the pH 2.0 fraction from 
the first passage of the mini-library was obtained in this 
manner . 

b) Second Enrichment Cycle 
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Forty jul of the above phage stock was added to 10 /il 
of a 50% slurry of HNE beads in TBS/BSA. The sample was 
allowed to mix for 1.5 hours and the HNE beads were washed 
with TBS/BSA, TBS/0 ♦ 5% Tween and sodium citrate buffers as 
5 described above, Aliqouts of neutralized pH fractions 
were diluted and titered as described above* 

The elution profile for the second passage of the 
mini-library over HNE beads is shown in Figure 7. The 
largest percentage of the input phage which bound to the 
10 HNE beads was recovered in the pH 3.5 wash. A smaller 
peak centered on pH 4.5 may represent residual fusion 
phage from the first passage of the mini-library which 
CI eluted at pH 5*0. The percentage of total input phage 

J; which eluted at pH 3.5 in the second cycle exceeds the 

JJ is percentage of input phage which eluted at pH 5.0 in the 

yj first cycle. This is indicative of more avid binding of 

y ^ fusion phage to the HNE matrix. Taken together, the 

^ significant shift in the pH elution profile suggests that 

» selection for fusion phage displaying BPTI variants with 

5 20 higher affinity for HNE occurred. 

fU c) Third Cycle 

Phage obtained in the pH -2.0 fraction from the second 
U passage of the mini-library were amplified as above and 

subjected to a third round of fractionation. The pH 

25 elution profile is shown in Figure 7. The largest 
percentage of input phage was recovered in the pH 3.5 wash 
as is the case with the second passage of the mini- 
library. However, the minor peak centered on pH 4.5 is 
diminished in the third passage relative to the second 

30 passage. Furthermore, the percentage of input phage which 
eluted at pH 3.5 is greater in the third passage than in 
the second passage. In comparison, the BPTI (K15V,R17L) - 
III fusion phage elute from HNE beads as a peak centered 
on pH 4.25. Taken together, the data suggests that a 
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significant selection for fusion phage displaying PEPIs 
with high affinity for HNE occurred. Furthermore, since 
more extreme pH conditions are required to elute fusion 
phage in the third passage of the MYMUT library relative 
5 to those conditions needed to elute BPTI (K15V,R17L) -III MA 
phage, this suggests that those fusion phage which appear 
in the pH 3,5 fraction may display a PEPI with a higher 
affinity for HNE than the BPTI (K15V,R17L) variant ( i.e. 
< 6»10" 1:L M) . 

10 d) Characterization of Selected Fusion Phage 

The pH 2.0 fraction from the third passage of the 
^ mini-library was titered and plaques were obtained on a 

Jj lawn of cells. Twenty plaques were picked at random and 

IB phage derived from plaques were probed with the CYSB 

^ 15 oligonucleotide via the Dot Blot Procedure* Autoradio- 

S graphy of the filter revealed that all 20 samples gave a 

H| positive hybridization signal indicating that fusion phage 

03 were present and the DNA encoding residues 35 to 47 of 

BPTI(MGNG) is contained within the recombinant M13 
|S 20 genomes. Rf DNA was prepared for the 20 clones and 

f|j initial dideoxy sequencing revealed that 12 clones were 

*f identical. This sequence was designated EpiNEa (Table 

Tz 207) . No DNA sequence changes were observed apart from 

the planned variegation. Hence the cassette mutagenesis 
25 procedure preserved the context of the planned variegation 
of the pepi gene. The Dot Blot Procedure was employed to 
probe all 20 selected clones from the pH 2.0 fraction from 
the third passage of the mini-library with an oligonucle- 
otide homologous to the sequence of EpiNEa. Following 
30 high stringency washing, autoradiography revealed that all 
20 selected clones were identical in the PI region. 
Furthermore dot blot analysis revealed that of the 28 
different phage samples pooled to create the mini-library, 
only one contained the EpiNEa sequence. Hence in just 
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three passes of the mini-library over HNE beads, 1 out of 
28 input fusion phage was selected for and appears as a 
pure population in the lowest pH fraction from the third 
passage of the library. That the EpiNEa phage elute at pH 
5 3.5 while BPTI (K15V,R17L) -III MA phage elute at a higher 
pH strongly suggests that the EpiNEor protein has a signi- 
ficantly higher affinity than BPTI (K15V, R17L) for HNE. 

4^ Fractionation of the MYMUT Library 

a) Three cycles of enrichment 

10 The same procedure used above .to fractionation the 

mini-library was used to fractionate the entire MYMUT PEPI 
library consisting of fusion phage displaying 1000 
different proteins. The phage inputs for the first, 
second and third rounds of fractionation were 4.0»10 11 , 

15 5.8»10 10 , and 1.1-10 11 pfu respectively. Figure 8 
illustrates that the largest percentage of input phage 
which bound to the HNE matrix was recovered in the pH 5.0 
wash in the first enrichment cycle. The pH elution 
profile is very similar to that seen for the first passage 

20 of the mini-library over HNE beads. A trailing edge is 
also observed on the low pH side of the pH 5.0 peak 
however this is not as prominent as that- observed for the 
mini-library. The percentage of input phage which eluted 
in the pH 7.0 wash was greater than that eluted in the pH 

25 6.0 wash. This is in contrast to the result obtained for 
the first passage of the mini library and may reflect the 
presence of «20% parental BPTI (MGNG) -III MA phage in the 
MYMUT library pool. These phage adhere to the HNE beads 
weakly (if at all) and elute in the pH 7.0 fraction. That 

30 no parent phage were present in the mini-library is 
consistent with the absence of a peak at pH 7.0 in the 
first passage of the mini-library. 

Phage present in the pH 2.0 fraction from the first 
passage of the MYMUT library were amplified as described 
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previously and subjected to a second round of fractiona- 
tion. The largest percentage of input phage which bound 
to the HNE beads was recovered in the pH 3*5 wash (Figure 
8). A minor peak centered on pH 4,5 was also evident* 
5 The fact that more extreme pH conditions were required to 
elute the majority of bound fusion phage suggested that 
selection of fusion phage displaying PEPIs with higher 
affinity for HNE had occurred. This was also indicated by 
the fact that the total percentage of input phage which 
10 appeared in the pH 3,5 wash in the second enrichment cycle 
was 10 times greater than the percentage of input which 
appeared in the pH 5*0 wash in the first cycle. 

Fusion phage from the pH 2.0 fraction of the second 
pass of the MYMUT library were amplified and subjected to 

15 a third passage over HNE beads. The proportion of fusion 
phage appearing in the pH 3 . 5 fraction relative to that in 
the 4.5 fraction was greater in the third passage than in 
the second passage (Figure 8) . Also the amount of fusion 
phage appearing in the pH 3.5 fraction was higher in the 

20 third passage than in the second passage. The fact that 
wash conditions less than pH 4.25 were required to elute 
bound fusion phage derived from the MYMUT library suggests 
that the EpiNEs displayed by these phage possess a Tiigher 
affinity for HNE than the BPTI (K15V,R17L) variant. 

25 b^ Characterization of Selected Clones 

The pH 2.0 fraction from the third enrichment cycle 
of the MYMUT library was titered on a lawn of cells. 
Twenty plaques were picked at random. Rf DNA was prepared 
for each of the clones and fusion phage were collected by 
30 PEG precipitation. Clonally pure populations of fusion 
phage in TBS/BSA were prepared and characterized with 
respect to their affinity for immobilized HNE. pH elution 
profiles were obtained to determine the stringency of the 
conditions required to elute bound fusion phage from the 
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HNE matrix. Figure 9 illustrates the pH profiles obtained 
for EpiNE clones 1, 3, and 7. The pH profiles for all 3 
clones exhibit a peak centered on pH 3.5. Unlike the pH 
profile obtained for the third passage of the MYMUT 
5 library, no minor peak centered on pH 4.5 is evident. 
This is consistent with the clonal purity of the selected 
EpiNE phage utilized to generate the profiles. The 
elution peaks are not symmetrical and a prominent trailing 
edge on the low pH side. In all probability, the 10 

10 minute elution period employed is inadequate to remove 
bound fusion phage at the low pH conditions. EpiNE clones 
1 through 8 have the following characteristics: five 
clones (identified as EpiNEl, EpiNE 3 , EpiNES, EpiNE 6, and 
EpiNE7) display very similar pH profiles centered on pH 

15 3.5. The remaining 3 clones elute in the pH 3.5 to 4.0 
range. There remains some diversity amongst the 20 
randomly chosen clones obtained from the pH 2.0 fraction 
of the third passage of the MYMUT library and these clones 
might exhibit different affinities for HNE. 

20 c^ Sequences of the EpiNE Clones 

The DNA sequences encoding the PI regions of the 
different EpiNE clones were determined by dideoxy sequenc- 
ing of Rf DNA. The sequences are shown in Table 208. 
Essentially, only the codons targeted for mutagenesis 

25 ( i.e. 15 to 19) were altered as a consequence of cassette 
mutagenesis using the MYMUT oligonucleotide. Only 1 codon 
outside the target region was found to contain an unex- 
pected alteration. In this case, codon 21 of EpiNE 8 was 
altered from a tyrosine codon (TAT) to a SER codon (TCT) 

30 by a single nucleotide substitution. This error could 
have been introduced into the MYMUT oligonucleotide during 
its synthesis. Alternatively, an error could have been 
introduced when the single-stranded MYMUT oligonucleotide 
was converted to the double-stranded form by Sequenase. 
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7) Effect of pH on the Dissociation of Bound BPTI -III MK 



and BPTI (K15L) -III MA Phage from Immobilized Neutrophil 
Elastase 

The affinity of a given fusion phage for an immobil- 
5 ized serine protease can be characterized on the basis of 
the amount of bound fusion phage which elutes from the 
beads by washing with a pH 2.2 buffer. This represents 
rather extreme conditions for the dissociation of fusion 
phage from beads. Since the affinity of the BPTI variants 

10 described above for HNE is not high (K^ > 1*10~ 9 M) it was 
anticipated that fusion phage displaying these variants 
might dissociate from HNE beads under less severe pH 
conditions* Furthermore fusion phage might dissociate 
from HNE beads under specific pH conditions characteristic 

15 of the particular BPTI variant displayed by the phage. 
Low pH buffers providing stringent wash conditions might 
be required to dissociate fusion phage displaying a BPTI 
variant with a high affinity for HNE whereas neutral pH 
conditions might be sufficient to dislodge a fusion phage 

20 displaying a BPTI variant with a weak affinity for HNE* 

Thirty pi of BPTI (K15L) -III MA phage (1.7*10 10 pfu/ml 
in TBS/BSA) were added to 5 Ml of a 50% slurry of immobil- 
ized HNE also in TBS/BSA. Similarly, 30 pi of BPTI-III MA 
phage (8.6*10 10 pfu/ml in TBS/BSA) were added to 5 jul of 

25 immobilized HNE. The above conditions were chosen to 
ensure that an approximately equivalent number of phage 
particles were added to the beads. The samples were 
incubated for 3 hours on a Labquake shaker. The beads 
were washed with 0.5 ml of TBS/BSA for 5 min on the 

3 0 shaker, recovered by centrifugation and the supernatant 
was removed. The beads were washed with 0.5 ml of 
TBS/0.1% Tween-20 for 5 minutes and recovered by centri- 
fugation. Four additional washes with TBS/0.1% Tween-20 
were performed as described above. The beads were washed 
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as above with 0.5 ml of 100 mM sodium citrate, pH 7*0 
containing 1.0 mg/ml BSA. The beads were recovered by 
centrifugation and the supernatant was removed. Subse- 
quently, the HNE beads were washed sequentially with a 
5 series of 100 mM sodium citrate, 1.0 mg/ml BSA buffers of 
pH 6.0, 5.0, 4.0 and 3 . 0 and finally with the 2 . 2 elution 
buffer described above. The pH washes were neutralized by 
the addition of 1 M Tris, pH 8.0, diluted in LB broth and 
titered for plaque-forming units on a lawn of cells. 

10 Table 203 illustrates that a low percentage of the 

input BPTI-III MK fusion phage adhered to the HNE beads 
and was recovered in the pH 7.0 and 6.0 washes predomin- 
% antly. By contrast, a significantly higher percentage of 

fg the BPTI (K15L) -III MA phage bound to the HNE beads and was 

C: 15 recovered predominantly in the pH 5.0 and 4.0 washes. 

2? Hence lower pH conditions ( i.e. more stringent) are 

required to dissociate BPTI(K15L) -III MA than BPTI-MK 
S3 phage from immobilized HNE. The affinity of BPTI(K15L) is 

over 1000 times greater than that of BPTI for HNE (based 
5? 20 on reported values (BECK88b) ) . Hence this suggests 

f|j that lower pH conditions are indeed required to dissociate 

m fusion phage displaying a BPTI variant with a higher 

f 3 affinity for HNE. 

81 Construction of BPTI (MGNG) -III MA Phage 

25 The light chain of bovine inter-a -trypsin inhibitor 

contains 2 domains highly homologous to BPTI. The amino 
terminal proximal domain (called BI-8e) has been generated 
by proteolysis and shown to be a potent inhibitor of HNE 
(K^ « 4.4 -lO" 11 M) (ALBR83). By contrast a BPTI variant 

30 with the single substitution of LEU for LYS 15 exhibits a 
moderate affinity for HNE = 2.9-10" 9 M) (BECK88b) . It 

has been proposed that the PI residue is the primary 
determinant of the specificity and potency of BPTI-like 
molecules (BECK88b, LASK80 and works cited therein) • 
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Although both BI-8e and BPTI(K15L) feature LEU at their 
respective PI positions, there is a 66 fold difference in 
the affinities of these molecules for HNE. Structural 
features, other than the PI residue, must contribute to 
5 the affinity of BPTI-like molecules for HNE. 

A comparison of the structures of BI-8e and BPTI- 
(K15L) reveals the presence of three positively charged 
residues at positions 39, 41, and 42 of BPTI which are 
absent in BI-8e. These hydrophilic and highly charged 

10 residues of BPTI are displayed on a loop which underlies 
the loop containing the PI residue and is connected to it 
via a disulfide bridge. Residues within the underlying 
loop (in particular residue 39) participate in the 
interaction of BPTI with the surface of trypsin near the 

15 catalytic pocket (BLOW72) and may contribute significantly 
to the tenacious binding of BPTI to trypsin. However, 
these hydrophilic residues might hamper the docking of 
BPTI variants with HNE. In support of this hypothesis, 
BI-8e displays a high affinity for HNE and contains no 

20 charged residues in the region spanning residues 39-42. 
Hence residues 39 through 42 of wild type BPTI were 
replaced with the corresponding residues of the human 
homologue of BI-8e. We anticipated that a BPTI derivative 
containing the MET-GLY-ASN-GLY (MGNG) sequence would 

25 exhibit a higher affinity for HNE than corresponding 
derivatives which retain the sequence of wild type BPTI at 
residues 39-42. 

A double stranded oligonucleotide with AccI and Eaq I 
compatible ends was designed to introduce the desired 
30 alteration of residues 39 to 42 via cassette mutagenesis. 
Codon 45 was altered to create a new Xmn I site, unique in 
the structure of the BPTI gene, which could be used to 
screen for mutants. This alteration at codon 45 does not 
alter the encoded amino-acid sequence. BPTI-III MA Rf DNA 
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was digested with Acc I. Two oligonucleotides (CYSB and 
CYST) corresponding to the bottom and top strands of the 
mutagenic DNA were annealed and ligated to the AccI 
digested BPTI-III MA Rf DNA, The sample was digested with 
5 Bglll and the 2.1 kb Bglll/EagI fragment was purified* 
BPTI-III MA Rf was also digested with Bglll and Eao l and 
the 6*0 kb fragment was isolated and ligated to the 2.1 kb 
Bgl ll/Eacrl fragment described above. Ligation samples 
were used to transfect competent cells which were plated 

10 to permit the formation of plaques on a lawn of cells. 
Phage derived from plaques were probed with a radioactive- 
ly labelled oligonucleotide (CYSB) using the Dot Blot 
Procedure. Positive clones were identified by autoradio- 
graphy of the Nytran membrane after washing at high 

15 stringency conditions. Rf DNA was prepared from Ap R 
cultures containing fusion phage which hybridized to the 
CYSB probe. Restriction enzyme analysis and DNA sequen- 
cing confirmed that codons 39-42 of BPTI had been altered. 
The Rf DNA was designated BPTI (MGNG) -III MA. 

20 9) Construction of BPTI (K15L.MGNG) -III MA 

BPTI (MGNG) -III MA Rf DNA was digested with AccI and 
the 5.6 kb fragment was purified. BPTI (K15L) -III MA was 
digested with Acc I and the 2.5 kb DNA fragment was 
purified. The two fragments above were ligated together 

25 and ligation samples were used to transfect competent 
cells which were plated for plaque production. Large and 
small plaques were observed on the plate. Representative 
plaques of each type were picked and phage were probed 
with the LEUl oligonucleotide via the Dot Blot Procedure. 

30 After the Nytran filter had been washed under high 
stringency conditions, positive clones were identified by 
autoradiography. Only the phage which hybridized to the 
LEUl oligonucleotide gave rise to the small plaques 
confirming an earlier observation that substitution of LEU 



244 



for LYS 15 substantially reduces phage infectivity. 
Appropriate cultures containing phage which hybridized to 
the LEU1 oligonucleotide were used to prepare Rf DNA* 
Restriction enzyme analysis and DNA sequencing confirmed 
5 that the K15L mutation had been introduced into BPTI- 
(MGNG)-III MA* This Rf DNA was designated BPTI(K15L,- 
MGNG)-III MA. 

10) Effect of Mutation of Residues 39-42 of BPTKK15L) on 

its Affinity for Immobilized HNE 

10 Thirty 111 of BPTI (K15L f MGNG) -III MA phage (9.2 -10 9 

% pfu/ml in TBS/BSA) were added to 5 fxl of a 50% slurry of 

jg immobilized HNE also in TBS/BSA* Similarly 30 jul of 

€1 BPTI (K15L) -III MA phage (1.2-10 10 pfu/ml in TBS/BSA) were 

2f added to immobilized HNE. The samples were incubated for 

111 15 3 hours on a Labquake shaker. The beads were washed for 5 
CO min with 0.5 ml of TBS/BSA and recovered by centri- 

fugation. The beads were washed 5 times with 0.5 ml of 
, TBS/0 . 1% Tween-20 as described abdve. Finally, the beads 
HI were washed sequentially with a series of 100 mM sodium 

lO 20 citrate buffers of pH 7-0, 6.0, 5.5, 5-0, 4.75, 4.5, 4.25, 
^ 4.0 and 3.5 as described above. pH washes were neutral- 

ized, diluted in LB broth and titered for plaque-forming 
units on a lawn of cells. 

Table 204 illustrates that almost twice as much of 
25 the BPTI(K15L, MGNG) -III MA as BPTI (K15L) -III MA phage 
bound to HNE beads. In both cases the pH 4.75 fraction 
contained the largest proportion of the recovered phage. 
This confirms that replacement of residues 39-42 of wild 
type BPTI with the corresponding residues of BI-8e 
30 enhances the binding of the BPTI(K15L) variant to HNE. 

Ill Fractionation of a Mixture of BPTI-III MK and 

BPTI (K15L , MGNG) -III MA Fusion Phage 

The observations described above indicate that 
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BPTI(K15L,MGNG)-III MA and BPTI-III MK phage exhibit 
different pH elution profiles from immobilized HNE. It 
seemed plausible that this property could be exploited to 
fractionate a mixture of different fusion phage. 

5 Fifteen pi of BPTI-III MK phage (3.92-10 10 pfu/ml in 

TBS/BSA) , equivalent to 8.91-10 7 Kia R transducing units, 
were added to 15 pi of BPTI(K15L,MGNG)-III MA phage 
(9.85-10 9 pfu/ml in TBS/BSA), equivalent to 4.44 *10 7 Ap R 
transducing units. Five pi of a 50% slurry of immobilized 
10 HNE in TBS/BSA was added to the phage and the sample was 
incubated for 3 hours on a Labquake mixer. The beads were 
washed for 5 minutes with 0.5 ml of TBS/BSA prior to being 
washed 5 times with 0.5 ml of TBS/2.0% Tween-20 as 
5f described above. Beads were washed for 5 minutes with 0.5 

15 ml of 100 mM sodium citrate, pH 7.0 containing 1.0 mg/ml 
BSA. The beads were recovered by centrifugation and the 
supernatant was removed. Subsequently, the HNE beads were 
washed sequentially with a series of 100 mM citrate 
buffers of pH 6.0, 5.0 and 4.0. The pH washes were 
20 neutralized by the addition of 130 pi of 1 M Tris, pH 8.0. 

The relative proportion of BPTI-III MK and BPTI(K15L- 
,MGNG)-III MA phage in each pH fraction was evaluated by 
determining the number of phage able to transduce cells to 
Km R as opposed to Ap R . Fusion phage diluted in 1 X 
25 Minimal A salts were added to 100 pi of cells (O.D.600 - 
0.8 concentrated to 1/20 original culture volume) also in 
Minimal salts in a final volume of 200 pi. The sample was 
incubated for 15 min at 37 e C prior to the addition of 200 
pi of 2 X LB broth. After an additional 15 min incubation 
30 at 37 °C, duplicate aliquots of cells were plated on LB 
plates containing either Ap or Km to permit the formation 
of colonies. Bacterial colonies on each type of plate 
were counted and the data was used to calculate the number 
of Ap R and Km R transducing units in each pH fraction. The 
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number of Ap R transducing units is indicative of the 
amount of BPTI (K15L,MGNG) -III MA phage in each pH fraction 
while the total number of Km R transducing units is 
indicative of the amount of BPTI-III MK phage. 

5 Table 205 illustrates that a low percentage of the 

BPTI-III MK input phage (as judged by Km R transducing 
units) adhered to the HNE beads and was recovered predomi- 
nantly in the pH 7.0 fraction. By contrast , a sig- 
nificantly higher percentage of the BPTI (K15L,MGNG) -III MA 

10 phage (as judged by Ap R transducing units) adhered to the 
HNE beads and was recovered predominantly in the pH 4.0 
fraction. A comparison of the total number of Ap R and 
transducing units in the pH 4.0 fraction shows that a 984- 
fold enrichment of BPTI (K15L,MGNG) -III MA phage over BPTI- 

15 III MK phage was achieved. Hence , the above procedure can 
be utilized to fractionate mixtures of fusion phage on the 
basis of their relative affinities for immobilized HNE. 

12) Construction of BPTI fK15V,R17L) -III MA 

A BPTI variant containing the alterations K15V and 
20 R17L demonstrates the highest affinity for HNE of any BPTI 
variant described to date (K^ « 6-10" 11 M) (AUER89) . As a 
means of testing the selection system described herein, a 
fusion phage displaying this variant of BPTI was generated 
and used as a "reference" phage to characterize the 
25 affinity for immobilized HNE of fusion phage displaying a 
BPTI variant with a known affinity for free HNE. A 76 bp 
mutagenic oligonucleotide (VAL1) was designed to convert 
the LYS 15 codon (AAA) to a VAL codon (GTT) and the ARG^7 
codon (CGA) to a LEU codon (CTG) . At the same time codons 
30 11, 12 and 13 were altered to destroy the Apa l site 
resident in the wild type BPTI gene while creating a new 
RsrII site, which could be used to screen for correct 
clones . 



The single stranded VALl oligonucleotide was convert- 
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ed to the double stranded form following the procedure 
described in Current Protocols in Molecular Biology 
(AUSU87) . One /ig of the VAL1 oligonucleotide was annealed 
to one /ig of a 20 bp primer (MB8) . The sample was heated 
5 to 80 °C, cooled to 62 °C and incubated at this temperature 
for 30 minutes before being allowed to cool to 37 °C. Two 
Ml of a 2.5 mM mixture of dNTPs and 10 units of Sequenase 
(U.S.B., Cleveland, Ohio) were added to the sample and 
second strand synthesis was allowed to proceed for 45 
10 minutes at 37 °C. One hundred units of Xhol was added to 
O the sample and digestion was allowed to proceed for 2 

]*f hours at 37 °C in 100 /il of 1 X Xho l digestion buffer. The 

^ digested DNA was subjected to electrophoreses on a 4% GTG 

4y NuSieve agarose (fmc Bioproducts, Rockland, ME) gel and 

^ 15 the 65 bp fragment was excised and purified from melted 
^ agarose by phenol extraction and ethanol precipitation. A 

portion of the recovered 65 bp fragment was subjected to 
3 electrophoresis on a 4% GTG NuSieve agarose gel for 

J quantitation. One hundred nanograms of the recovered 

20 fragment was dephosphorylated with 1.9 ^1 of HK^™) 
phosphatase (Epicentre Technologies, Madison, WI) at 37°C 
for 60 minutes. The reaction was stopped by heating at 
65 *C for 15 minutes. BPTI-MA Rf DNA was digested with 
Xho l and StuI and the 8.0 kb fragment was isolated. One 
25 jul of the dephosphorylation reaction (5 ng of double- 
stranded VAL1 oligonucleotide) was ligated to 50 ng of the 
8.0 kb XhoI/StuI fragment derived from BPTI-III MA Rf. 
Ligation samples were subjected to phenol extraction and 
DNA was recovered by ethanol precipitation. Portions of 
30 the recovered ligation DNA were added to 40 /il of electro- 
competent cells which were shocked using a Bio-Rad Gene 
Pulser device set at 1.7 kv, 25 /zF and 800 n. One ml of 
SOC media was immediately added to the cells which were 
allowed to recover at 37 °C for one hour. Aliquot s of the 
35 electroporated cells were plated onto LB plates containing 
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Ap to permit the formation of colonies. 

Phage contained within cultures derived from picked 
Ap^ colonies were probed with two radiolabelled oligonuc- 
leotides (PRP1 and ESP1) via the Dot Blot Procedure. Rf 
5 DNA was prepared from cultures containing phage which 
exhibited a strong hybridization signal with the ESP1 
oligonucleotide but not with the PRP1 oligonucleotide. 
Restriction enzyme analysis verified loss of the Apa l site 
and acquisition of a new Rsr II site diagnostic for the 
10 changes in the PI region. Fusion phage were also probed 
with a radiolabelled oligonucleotide (VLP1) via the Dot 
Blot Procedure. Autoradiography confirmed that fusion 
O phage which previously failed to hybridize to the PRP1 

J:j probe, hybridized to the VLP1 probe. DNA sequencing 

% 15 confirmed that the LYS 15 and ARG 17 codons had been 

III converted to VAL and LEU codons respectively. The Rf DNA 

III was designated BPTI (K15V,R17L) -III MA. 

S3 13) Affinity of BPTI f K15V,R17L) -III MA Phage for Immobi- 

le lized HNE 

3 1 20 Forty Ml of BPTI (K15,R17L) -III MA phage (9.8'10 10 

:S pfu/ml) in TBS/BSA were added to 10 jul of a 50% slurry of 

p immobilized HNE also in TBS/BSA. Similarly, 40 ill of 

H BPTI (K15L,MGNG) -III MA phage (5.13 *10 9 pfu/ml) in TBS/BSA 

were added to immobilized HNE. The samples were mixed for 
25 1.5 hours on a Labquake shaker. Beads were washed once 
for 5 min with 0.5 ml of TBS/BSA and then 5 times with 0.5 
ml of TBS/1.0% Tween-20 as described previously. Subse- 
quently the beads were washed sequentially with a series 
of 50 mM sodium citrate buffers containing 150 mM NaCl, 
30 1.0 mg/ml BSA of pH 7.0, 6.0, 5.0, 4.5, 4.0, 3.75, 3.5 and 
3.0. In the case of the BPTI (K15L,MGNG) -III MA phage, the 
pH 3.75 and 3.0 washes were omitted. Two washes were 
performed at each pH and the supernatants were pooled, 
neutralized with 1 M Tris pH 8.0, diluted in LB broth and 
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titered for plaque-forming units on a lawn of cells. 

Table 206 illustrates that the pH 4^5 and 4.0 
fractions contained the largest proportion of the reco- 
vered BPTI (K15V,R17L) -III MA phage. By contrast, the 
5 BPTI (K15L,MGNG) -III MA phage, like BPTI (K15L) -III MA 
phage, were recovered predominantly in the pH 5.0 and 4.5 
fractions, as shown above. The affinity of BPTI(K15V,- 
R17L) is 48 times greater than that of BPTI(K15L) for HNE 
(based on reported values, AUER89 for BPTI (K15V,R17L) 
10 and BECK8 8 b for BPTI (K15L) ) . That the pH elution profile 
for BPTI (K15V,R17L) -III MA phage exhibits a peak at pH 4.0 
while the profile for BPTI (K15L) -III MA phage displays a 
O peak at pH 4.5 supports the contention that lower pH 
41 conditions are required to dissociate, from immobilized 
15 HNE, fusion phage displaying a BPTI variant with a higher 
hj affinity for free HNE. 
w *** 

03 EXAMPLE IV 

Q CONSTRUCTION OF A VARIEGATED POPULATION OF PHAGE DISPLAY- 

Jjj 20 ING BPTI DERXVATES AND FRACTIONATION FOR MEMBERS THAT 

SIS; 

# _ _ DISPLAY BINDING DOMAINS HAVING HIGH AFFINITY FOR HUMAN 

H NEUTROPHIL ELASTASE: 

We here describe generation of a library of 1000 
different potential engineered protease inhibitiors 
25 (PEPIs) and the fractionation with immobilized HNE to 
obtain an engineered protease inhibitor (Epi) having high 
affinity for HNE. Successful Epis that bind HNE are 
designated EpiNEs* 

1^ Design of a Mutagenic Oligonucleotide to Create a 
30 Library of Fusion Phage 

A 76 bp variegated oligonucleotide (MYMUT) was 
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designed to construct a library of fusion phage displaying 
1000 different PEPIs derived from BPTI. The oligonucle- 
otide contains 1728 different DNA sequences but due to the 
degeneracy of the genetic code, it encodes 1000 different 
5 protein sequences. The oligonucleotide was designed so as 
to destroy an Apa l site (shown in Table 113) encompassing 
codons 12 and 13. Apa l digestion could be used to select 
against the parental Rf DNA used to construct the library. 

The MYMUT oligonucleotide permits the substitution of 

10 5 hydrophobic residues (PHE, LEU, ILE, VAL, and MET via a 
DTS codon (D = approximately equimolar A, T, and G; S » 
approximately equimolar C and G) ) for LYS 15 . Replacement 
of LYS 15 in BPTI with aliphatic hydrophobic residues via 
semi-synthesis has provided proteins having higher 

15 affinity for HNE than BPTI (TANN77, JERI74a,b, WENZ80, 
TSCH86, BECK88b) . At position 16, either GLY or ALA are 
permitted (GST codon) . This is in keeping with the 
predominance of these two residues at the corresponding 
positions in a variety of BPTI homologues (CREI87) . The 

20 variegation scheme at position 17 is identical to that at 
15. Limited data is available on the relative contribu- 
tion of this residue to the interaction of BPTI homologues 
with HNE. A variety of hydrophobic residues at position 
17 was included with the anticipation that they would 

25 enhance the docking of a BPTI variant with HNE. Finally 
at positions 18 and 19, 4 (PHE, SER, THR, and ILE via a 
WYC codon (W = approximately equimolar A and T; Y = 
approximately equimolar T and C) ) and 5 (SER, PRO, THR, 
LYS, GLN, and stop via an HMA codon (H « approximately 

3 0 equimolar A, C, and T; M = approximately equimolar A and 
C)) different amino acids respectively are encoded. These 
different amino acid residues are found in the correspond- 
ing positions of BPTI homologues that are known to bind to 
HNE (CREI87) . Although the amino acids included in the 

35 PEPI library were chosen because there was some indication 
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DNA fragment from MK RF was radioactively labelled with 
32 P-adCTP using an oligolabelling kit (Pharmacia, Piscat- 
away, NJ) . Nytran membranes were transferred from 
pre-hybridization solution to Southern hybridization 
5 solution (5Prime-3Prime) at 42 °C. The radioactive probe 
was added to the hybridization solution and following 
overnight incubation at A2 °C, the filter was washed 3 
times with 2 x SSC, 0.1% SDS at room temperature and once 
at 65 °C in 2 x SSC, 0.1% SDS. Nytran membranes were 

10 subjected to autoradiography. The efficiency of the 
affinity selection system can be semi-quant itatively 
determined using the above dot blot procedure. Comparison 
of dots Al and Bl or CI and Dl indicates that the majority 
of phage did not stick to the streptavidin-agarose beads. 

15 Washing with TBS/Tween buffer removes the majority of 
phage which are non-specif ically associated with strept- 
avidin beads. Exposure of the streptavidin beads to 
elution buffer releases bound phage only in the case of 
MK-BPTI phage which have previously been incubated with 

20 biotinylated rabbit anti-BPTI IgG. This data indicates 
that the affinity selection system described above can be 
utilized to select for phage displaying a specific antigen 
(in this case BPTI) . We estimate an enrichment factor of 
at least 40 fold based on the calculation 

25 

Percent MK-BPTI phage recovered 

Enrichment Factor - — " 

Percent MK phage recovered 

30 

EXAMPLE III 

CHARACTERIZATION AND FRACTIONATION OF CLONALLY PURE 
POPULATIONS OF PHAGE, EACH DISPLAYING A SINGLE CHIMERIC 
APROTININ HOMOLOGOE/M13 GENE III PROTEIN: 
35 This Example demonstrates that chimeric phage 
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proteins displaying a target-binding domain can be eluted 
from immobilized target, by decreasing pH, and the pH at 
which the protein is eluted is dependent on the binding 
affinity of the domain for the target, 

5 Standard Procedures: 

Unless otherwise noted, all manipulations were 
carried out at room temperature. Unless otherwise noted, 
all cells are XL1-Blue(™) (Stratagene, La Jolla, CA) . 

1) Demonstration of the Binding of BPTI-III MK Phage to 

10 Active Trypsin Beads 

"Previous experiments designed to verify that BPTI 
displayed by fusion phage is functional relied on the use 
of immobilized anhydro- trypsin, a catalytically inactive 
form of trypsin. Although anhydro-trypsin is essentially 

15 identical to trypsin structurally (HUBE75, Y0K077) and in 
binding properties (VINC74, AKOH72) , we demonstrated that 
BPTI-III fusion phage also bind immobilized active 
trypsin. Demonstration of the binding of fusion phage to 
immobilized active protease and subsequent recovery of 

20 infectious phage facilitates subsequent experiments where 
the preparation of inactive forms of serine proteases by 
protein modification is laborious or not feasible. 

Fifty jul of BPTI-III MK phage (identified as MK-BPTI 
25 in USSN 07/487,063) (3.7«10 1:L pfu/ml) in either 50 mM 
Tris, pH 7.5, 150 mM NaCl, 1.0 mg/ml BSA (TBS/BSA) buffer 
or 50 mM sodium citrate, pH 6.5, 150 mM NaCl, 1.0 mg/ml 
BSA (CBS/BSA) buffer were added to 10 /^l of a 25% slurry 
of immobilized trypsin (Pierce Chemical Co., Roclcford, IL) 
30 also in TBS/BSA or CBS/BSA. As a control, 50 /xl MK phage 
(9.3 *10 12 pfu/ml) were added to 10 jul of a 25% slurry of 
immobilized trypsin in either TBS/BSA or CBS/BSA buffer. 
The infect ivity of BPTI-III MK phage is 25-fold lower than 
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that of MK phage; thus the conditions chosen above ensure 
that an approximately equivalent number of phage particles 
are added to the trypsin beads. After 3 hours of mixing 
on a Labquake shaker (Labindustries Inc., Berkeley, CA) 
5 0.5 ml of either TBS/BSA or CBS/BSA was added where 
appropriate to the samples. Beads were washed for 5 min 
and recovered by centrifugation for 30 sec. The super- 
natant was removed and 0.5 ml of TBS/0.1% Tween-20 was 
added. The beads were mixed for 5 minutes on the shaker 
10 and recovered by centrifugation as above. The supernatant 
was removed and the beads were washed an additional five 
times with TBS/0.1% Tween-20 as described above. Finally, 
p the beads were resuspended in 0.5 ml of elution buffer 

tfl (0.1 M HC1 containing 1.0 mg/ml BSA adjusted to pH 2.2 

^ 15 with glycine) , mixed for 5 minutes and recovered by 

sTi centrifugation. The supernatant fraction was removed and 

CS neutralized by the addition of 130 /il of 1 M Tris, pH 8.0. 

N Aliquots of the neutralized elution sample were diluted in 

LB broth and titered for plaque- forming units on a lawn of 
Q 20 cells. 

^: Table 201 illustrates that a significant percentage 

[f\ of the input BPTI-III MK phage bound to immobilized 

O - trypsin and vas recovered by washing with elution buffer. 

^ The amount of fusion phage which bound to the beads was 

25 greater in TBS buffer (pH 7.5) than in CBS buffer (pH 
6.5). This is consistent with the observation that the 
affinity of BPTI for trypsin is greater at pH 7.5 than at 
pH 6.5 (VINC72, VINC74) . A much lower percentage of the 
MK control phage (which do not display BPTI) bound to 
30 immobilized trypsin and this binding was independent of 
the pH conditions. At pH 6.5, 1675 times more of the 
BPTI-III MK phage than of the MK phage bound to trypsin 
beads while at pH 7.5, a 2103-fold difference was ob- 
served. Hence fusion phage displaying BPTI adhere not 
35 only to anhydro-trypsin beads but also to active trypsin 
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beads and can be recovered as infectious phage. These 
data, in conjunction with earlier findings, strongly 
suggest that BPTI displayed on the surface of fusion phage 
is appropriately folded and functional. 

5 2) Generation of PI Mutants of BPTI 

To demonstrate the specificity of interaction of 
BPTI-III fusion phage with immobilized serine proteases, 
single amino acid substitutions were introduced at the PI 
position (residue 15 of mature BPTI) of the BPTI-III 
10 fusion protein by site-directed mutagenesis. A 25mer 
mutagenic oligonucleotide (PI) was designed to substitute 
a LEU codon for the LYS 15 codon. This alteration is 
Jpj desired because BPTI(KISL) is a moderately good inhibitor 

fj| of human neutrophil elastase (HNE) (K^j = 2.9*10~ 9 M) 

€l 15 (BECK88b) and a poor inhibitor of trypsin, A fusion phage 

:J;J displaying BPTI(K15L) should bind to immobilized HNE but 

?1 not to immobilized trypsin- BPTI-III MK fusion phage 

Qj would be expected to display the opposite phenotype (bind 

£ to trypsin, fail to bind to HNE) . These observations 

20 would illustrate the binding specificity of BPTI-III 
m fusion phage for immobilized serine proteases. 

2 -Mutagenesis of the PI region of the BPTI-VIII gene 

contained within the intergenic region of recombinant 
phage MB46 was carried out using the Muta-Gene M13 In 

25 Vitro Mutagenesis Kit (Bio-Rad, Richmond, CA) . MB46 phage 
(7.5»10 6 pfu) were used to infect a 50 ml culture of CJ236 
cells (O.D.600 « 0.5). Following overnight incubation at 
37 °C, phage were recovered and uracil-containing single- 
stranded DNA was extracted from the phage. The single- 

30 stranded DNA was further purified by NACS chromatography 
as recommended by the manufacturer (B.R.L., Gaithersburg, 
MD) . 

Two hundred nanograms of the purified single-stranded 
DNA were annealed to 3 picomoles of the phosphorylated 
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25mer mutagenic oligonucleotide (PI) * Following filling 
in with T4 DNA polymerase- and ligation with T4 DNA ligase, 
the sample was used to transfect competent cells which 
were subsequently plated on LB plates to permit the 
5 formation of plaques. Phage derived from picked plaques 
were applied to a Nytran membrane using a Schleicher and 
Schuell (Keene, NH) minifold I apparatus (Dot Blot 
Procedure) . Phage DNA was immobilized onto the filter by 
baking at 80 °C for 2 hours. The filter was bathed in 1 X 
10 Southern pre-hybridization buffer (5Prime-3Prime, West 
Chester,. PA) for. 2 hours. Subsequently, the filter was 
incubated in 1 X Southern, hybridization solution (SPrime- 
fn 3Prime) containing a 21mer probing oligonucleotide (LEU1) 

€1 which had been radioactively labelled with gamma- 32 P-ATP 

*y 15 (N«E.N./DuPont, Boston, MA) by T4 polynucleotide kinase 

■71 (New England BioLabs (NEB) , Beverly, MA) . Following 

p:| overnight hybridization, the filter was washed 3 times 

"'I with 6 X SSC at room temperature and once at 60 6 C in 6 X 

^ SSC prior to autoradiography. Clones exhibiting strong 

□ 20 hybridization signals were chosen for large scale Rf 

01 preparation using the PZ523 spin column protocol (SPrime- 

>2 3Prime) * Restriction enzyme analysis confirmed that the 

pi structure of - -the Rf was correct and DNA sequencing 

yk confirmed the substitution of a LEU codon (TTG) for the 

25 LYS 15 codon (AAA). This Rf DNA was designated MB46 (K15L) . 

3) Generation of the BPTI-III MA Vector 

The original gene III fusion phage MK can be detected 
on the basis of its ability to transduce cells to kanamy- 
cin resistance (Km R ) . It was deemed advantageous to 
30 generate a second gene III fusion vector which can confer 
resistance to a different antibiotic, namely ampicillin 
(Ap) . One could then mix a fusion phage conferring Ap R 
while displaying engineered protease inhibitor A (EPI-A) 
with a second fusion phage conferring Km R while displaying 
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EPI-B. The mixture could be added to an immobilized 
serine protease and, following elution of bound fusion 
phage, one could evaluate the relative affinity of the two 
EPIs for the immobilized protease from the relative 
5 abundance of phage that transduce cells to Km R or Ap R . 

The ap^ gene is contained in the vector pGem3Zf 
(Promega Corp., Madison, WI) which can be packaged as 
single stranded DNA contained in bacteriophage when helper 
phage are added to bacteria containing this vector. The 

10 recognition sites for restriction enzymes Smal and SnaBI 
were engineered into the 3 1 non-coding region of the Ap R 
(^-lactamase) gene using the" technique of synthetic 
oligonucleotide directed site specific mutagenesis. The 
single stranded DNA was used as the template for in vitro 

15 mutagenesis leading to the following DNA sequence altera- 
tions (numbering as supplied by Promega) : a) to create a 
Sma l (or Xma l) site, bases T 1115 ->C and &ni6~ >c ' and b ) 
to create a Sna BI site, Gii25" >T / c 1129~ >,r ' and T 1130" >A * 
The alterations were confirmed by radiolabelled probe 

20 analysis with the mutating oligonucleotide and restriction 
enzyme analysis; this plasmid is named pSGK3. 

Plasmid SGK3 was cut with Aat ll and Sma l and treated 
with T4 DNA polymerase (NEB) to remove overhanging 3 1 ends 
(MANI82, SAMB89) . Phosphorylated Hindlll linkers (NEB) 

25 were ligated to the blunt ends of the DNA and following 
Hin di 1 1 digestion, the 1.1 kb fragment was isolated by 
agarose gel electrophoresis followed by purification on an 
Ultrafree-MC filter unit as recommended by the manufac- 
turer (Millipore, Bedford, MA) . M13-MBl/2-delta Rf DNA 

30 was cut with Hindlll and the linearized Rf was purified 
and ligated to the 1.1 kb fragment derived from pSGK3. 
Ligation samples were used to transfect competent cells 
which were plated on LB plates containing Ap. Colonies 
were picked and grown in LB broth containing Ap overnight 
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at 37 d C. Aliguots of the culture supernatants were 
assayed for the presence of infectious phage. Rf DNA was 
prepared from cultures which were both Ap R and contained 
infectious phage. Restriction enzyme analysis confirmed 
5 that the Rf contained a single copy of the Ap R gene 
inserted into the intergenic region of the M13 genome in 
the same transcriptional orientation as the phage genes ♦ 
This Rf DNA was designated MA. 

The 5.9 kb Bal ll/ Bsm I fragment from MA Rf DNA and the 
10 2.2 kb Bglll/BsmI fragment from BPTI-III MK Rf DNA were 
ligated together and a portion of the ligation mixture was 
used to transfect competent cells which were subsequently 
plated to permit plaque formation on a lawn of cells. 
ffi Large and small size plaques were observed on the plates. 

CI 15 Small size plaques were picked for further analysis since 

W BPTI-III fusion phage give rise to small plaques due to 

impairment of gene III protein function. Small plaques 
|J| were added to LB broth containing Ap and cultures were 

5 f incubated overnight at 37°C. An Ap R culture which 

I;? 20 contained phage which gave rise to small plaques when 

fl\ plated on a lawn of cells was used as a source of Rf DNA. 

yp Restriction enzyme analysis confirmed that the BPTI-III 

fusion gene had been inserted into the MA vector. This Rf 
r was designated BPTI-III MA. 

25 4) Construction of BPTI ( K15L) -III MA 

MB46(K15L) Rf DNA was digested with Xho l and Eaa l and 
the 125 bp DNA fragment was isolated by electrophoresis on 
a 2% agarose gel followed by extraction from an agarose 
slice by centrifugation through an Ultrafree-MC filter 
30 unit. The 8.0 kb XhoI/EagI fragment derived from BPTI-III 
MA Rf was also prepared. The above two fragments were 
ligated and the ligation sample was used to transfect 
competent cells which were plated on LB plates containing 
Ap. Colonies were picked and used to inoculate LB broth 
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containing Ap. Cultures were incubated overnight at 37 °C 
and phage within the culture supernatants was probed using 
the Dot Blot Procedure. Filters were hybridized to a 
radioactively labelled oligonucleotide (LEUi) . Positive 
5 clones were identified by autoradiography after washing 
filters under high stringency conditions. Rf DNA was 
prepared from Ap R cultures which contained phage carrying 
the K15L mutation. Restriction enzyme analysis and DNA 
sequencing confirmed that the K15L mutation had been 

10 introduced into the BPTI-III MA .Rf . This Rf was desig- 
nated BPTI(K15L)-III MA. Interestingly, BPTI (K15L) -III MA 
phage gave rise to extremely small plaques on a lawn of 
cells and the infectivity of the phage is 4 to 5 fold less 
than that of BPTI-III MK phage. This suggests that the 

15 substitution of LEU for LYS 15 impairs the ability of the 
BPTI:gene III fusion protein to mediate phage infection of 
bacterial cells. 

5) Preparation of Immobilized Human Neutrophil Elastase 

One ml of Reacti-Gel 6 x CDI activated agarose 
20 (Pierce Chemical Co.) in acetone (200 /*1 packed beads) was 
introduced into an empty Select-D spin column (SPrime- 
3 Prime) . The acetone was drained out and the beads were 
washed twice rapidly with 1.0 ml of ice cold water and 1.0 
ml of ice cold 100 mM boric acid, pH 8.5, 0.9% NaCl. Two 
25 hundred jttl of 2.0 mg/ml human neutrophil elastase (HNE) 
(CalBiochem, San Diego, CA) in borate buffer were added to 
the beads. The column was sealed and mixed end over end 
on a Labquake Shaker at 4°C for 36 hours. The HNE 
solution was drained off and the beads were washed with 
30 ice cold 2.0 M Tris, pH 8.0 over a 2 hour period at 4°C to 
block remaining reactive groups. A 50% slurry of the 
beads in TBS/BSA was prepared. To this was added an equal 
volume of sterile 100% glycerol and the beads were stored 
as a 25% slurry at -20 °C. Prior to use, the beads were 
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washed 3 times with TBS/BSA and a 50% slurry in TBS/BSA 
was prepared. 

6) Characterization of the Affinity of BPTI-III MK and 

BPTI ( K15L) -III MA Phage for Immobilized Trypsin and Human 

Neutrophil Elastase 

Thirty jul of BPTI-III MK phage in TBS/BSA (1.7-10 11 
pfu/ml) was added to 5 /il of a 50% slurry of . either 
immobilized human neutrophil elastase or immobilized 
trypsin (Pierce Chemical Co.) also in TBS/BSA. Similarly 
30 /il of BPTI (K15L) -III MA phage' in TBS/BSA (3.2* 10 10 
pfu/ml) was added to either immobilized HNE or trypsin. 
Samples were mixed on a Labquake shaker for 3 hours. The 
beads were washed with 0.5 ml of TBS/BSA for 5 minutes and 
recovered by centrifugation. The supernatant was removed 
and the beads were washed 5 times with 0.5 ml of TBS/0.1% 
Tween-20. Finally , the beads were resuspended in 0.5 ml 
of elution buffer (0.1- M HC1 containing 1.0 mg/ml BSA 
adjusted to pH 2.2 with glycine), mixed for 5 minutes and 
recovered by centrifugation. The supernatant fraction was 
removed, neutralized with 130 /il of 1 M Tris, pH 8.0, 
diluted in LB broth, and titered for plaque- forming units 
on a lawn of cells. 

Table 202 illustrates that 82 times more of the BPTI- 
III MK input phage bound to the trypsin beads than to the 
HNE beads. By contrast, the BPTI (K15L) -III MA phage bound 
preferentially to HNE beads by a factor of 36. These 
results are consistent with the known affinities of wild 
type and the K15L variant of BPTI for trypsin and HNE. 
Hence BPTI-III fusion phage bind selectively to immobil- 
ized proteases and the nature of the BPTI variant dis- 
played on the surface of the fusion phage dictates which 
particular protease is the optimum receptor for the fusion 
phage . 
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achieved sufficient mass. Poor growth and insert insta- 
bility can be circumvented to a large extent, giving this 
system an advantage over the Gem-based vector described 
above • 

An overnight bacterial culture of XLl-Blue 

(TM) 

or 

SEF* is grown in LB medium containing tetracycline (50 fig 
per ml) to ensure the presence of pili as sites for 
bacteriophage binding and infection. This culture is 
diluted 100-fold into NZCYM medium containing tetracycline 
and bacterial growth allowed to proceed in an incubator 
shaker until a cell density of 1.0 (Ab 600nm) has been 
achieved. Phage, containing the expression vector and 
gene of interest, are added to the bacterial culture at a 
multiplicity of infection (MOI) of 10 and allowed to 
infect the cells for 30 minutes. Gene expression is then 
induced by the addition of IPTG to a final concentration 
of 0.5 mM and the culture allowed to grow overnight. 
Media collection and cell fractionation is as described 
elsewhere. 

Bacterial Cell Fractionation. 

After heterologous gene expression the bacterial cell 
culture can be separated into the* following fractions: 
conditioned medium, periplasmic fraction and post-peri- 
plasmic cell lysate. This is achieved using the following 
procedures . 

The culture is centrifuged to pellet the bacteria, 
allowing the supernatant to be stored as conditioned 
medium. This fraction contains any exported proteins. 
The pellet is taken up in 20% sucrose, 30mM Tris pH 8 and 
ImM EDTA (80 ml of buffer per gram of fresh weight pellet) 
and allowed to sit at room temperature for 10 minutes. 
The cells are repelleted and taken up in the same volume 
of ice cold 5mM MgS0 4 and left on ice for 10 minutes. 
Following centrifugation, to pellet the cells, the 
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supernatant (periplasmic fraction) is stored. A second 
round of osmotic shock fractionation can be undertaken if 
desired. 

The post-periplasmic pellet can be further lysed as 
follows. The pellet is resuspended in 1.5 ml of 20% 
sucrose, 40 mM Tris pH 8, 50mM EDTA and 2.5 mg of lysozyme 
(per gram fresh weight of starting pellet) . After 15 
minutes at room temperature 1.15 ml of 0.1% Triton X is 
added together with 300 fil of 5M NaCl and incubated for a 
further 15 minutes. 2.5 ml of 0.2 M triethanolamine (pH 
7.8), 150 /il of 1M CaCl2 # 100 M l of 1M MgCl 2 and 5 Mg of 
DNA'se are added and allowed to incubate," with end-over- 
-end mixing, for 20 minutes to reduce viscosity. This is 
followed by centrifugation with the supernatant being 
retained as the post-periplasmic lysate. 

The present invention is not, of course, limited to 
any particular expression system, whether bacterial or 
not. 

EXAMPLE IX 

CONSTRUCTION OF AN ITI -DOMAIN I/GENE III DISPLAY VECTOR 
1. ITI domain I as an IPBD 

Inter-ot-trypsin inhibitor (ITI) is a large (M r ca 
240,000) circulating protease inhibitor found in the 
plasma of many mammalian species (for recent reviews see 
ODOM90, SALI90, GEBH90, GEBH86) . The intact inhibitor is 
a glycoprotein and is currently believed to consist of 
three glycosylated subunits that interact through a strong 
glycosaminoglycan linkage (ODOM90, SALI90, ENGH89, 
SELL87). The anti-trypsin activity of ITI is located on 
the smallest subunit (ITI light chain, unglycosylated M r 
ca 15,000) which is identical in amino acid sequence to an 
acid stable inhibitor found in urine (UTI) and serum (STI) 



286 

(GEBH86, GEBH90) ♦ The mature light chain consists of a 21 
residue N- terminal sequence, glycosylated at SER 10 , 
followed by two tandem Kunitz-type domains the first of 
which is glycosylated at ASN45 (ODOM90) . In the human 
5 protein, the second Kunitz-type domain has been shown to 
inhibit trypsin, chymotrypsin, and plasmin (ALBR83a, 
ALBR83b, SELL87, SWAI88) . The first domain lacks these 
activities but has been reported to inhibit leukocyte 
elastase (10" 6 > > 10" 9 ) (ALBR83a,b, ODOM90) . cDNA 

10 encoding the ITI light chain also codes for a-l-microglob- 
ulin (TRAB86, KAUM86, DIAR90) ; the proteins are separated 
post-translationally by proteolysis. - - - 

The N-terminal Kunitz-type of the ITI light chain 
(ITI-D1, comprising residues 22 to 76 of the UTI sequence 

15 shown in Fig- 1 of GEBH86) possesses a number of charac- 
teristics that make it useful as an IPBD. The domain is 
highly homologous to both BPTI and the EpiNE series of 
proteins described elsewhere in the present application. 
Although an x-ray structure of the isolated domain is not 

20 available, crystallographic studies of the related Kunitz- 
type domain isolated from the Alzheimer's amyloid j8- 
protein (AAj9P) precursor show that this polypeptide 
assumes a crystal structure almost identical to that of 
BPTI (HYNE90) . Thus, it is likely that the solution 

25 structure of the isolated ITI-D1 polypeptide will be 
highly similar to the structures of BPTI and AA£P. In 
this case, the advantages described previously for use of 
BPTI as an IPBD apply to ITI-D1. ITI-D1 provides addi- 
tional advantages as an IDBP for the development of 

30 specific anti-elastase inhibitory activity* First, this 
domain has been reported to inhibit both leukocyte 
elastase (ALBR83a,b, ODOM90) and Cathepsin-G (SWAI88, 
ODOM90) ; activities which BPTI lacks. Second, ITI-D1 
lacks affinity for the related serine proteases trypsin, 

35 chymotrypsin, and plasmin (ALBR83a,b, SWAI88) , an advan- 
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tage for the development of specificity in inhibition. 
Finally, ITI-D1 is a human-derived polypeptide so deriva- 
tives are anticipated to show minimal antigenicity in 
clinical applications. 

5 2 . Construction of the display vector. 

For purposes of this discussion, numbering of the 
nucleic acid sequence for the ITI light chain gene is that 
of TRAB86 and of the amino acid sequence is that shown for 
UTI in Fig. 1 of GEBH86. DNA manipulations were conducted 
10 according to standard methods as described in SAMB89 and 
AUSU87 • 

The protein sequence of human ITI-D1 consists of 56 
amino acid residues extending from LYS 2 2 to ARG77 of the 
complete ITI light chain sequence. This sequence is 

15 encoded by the 168 bases between positions 750 and 917 in 
the cDNA sequence presented in TRAB86. The majority of 
the domain is contained between a Bgll site spanning 
bases 663 to 773 and a Pst I site spanning bases 903 to 
908. The insertion of the ITI-D1 sequence into M13 gene 

20 III was conducted in two steps. First a linker containing 
the appropriate ITI sequences outside the central Bal l to 
Pst I region was ligated into the Nar l site of phage MA RF 
DNA. In the second step, the remainder of the ITI-D1 
sequence was incorporated into the linker-bearing phage RF 

25 DNA. 

The linker DNA consisted of two synthetic oligonucle- 
otides (top and bottom strands) which, when annealed, 
produced a 54 bp double-stranded fragment with the 
following structure (5* to 3 1 ): 

30 NARI OVERHANG/ITI-5 1 /BGLI/STUFFER/PSTI/ITI-3 1 / NAR I 

OVERHANG 

The Nar l OVERHANG sequences provide compatible ends 
for ligation into a cut Narl site. The ITI-5 1 sequence 
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consists of ds DNA corresponding to the thirteen positions 
from A750 to T662 immediately 5' adjacent to the Bal l 
site in the ITI-D1 sequence. Two changes, both silent , 
are introduced in this sequence: T to C at position 658 
5 (changes codon for ASP 2 4 from GAT to GAC) and G to T at 
position 661 (changes codon for SER 2 s from TCG to TCT) • 
The sequences BGLI and PSTI are identical to the Bal l and 
PstI sites, respectively, in the ITI-D1 sequence. The 
ITI-3 1 sequence consists of dsDNA corresponding to the 

10 nine positions from A909 to T917 immediately 3 1 adjacent 
to the PstI site in the ITI-D1 sequence. ..The one. base 
change included in this sequence, A to T at position 917, 
is silent and changes the codon for ARG 77 from CGA to CGT. 
The STUFFER sequence consists of dsDNA encoding three 

15 residues (5* to 3 1 ): LEU (TTA) , TRP(TGG) , and SER(TCA) . 
The reverse complement of the STUFFER sequence encodes two 
translation termination codons (TGA and TAA) . Phage 
expressing gene III containing the linker in opposite 
orientation to that shown above will not produce a 

20 functional gene III product. 

Phage MA RF DNA was digested with Narl and the 
linear ca. 8.2 kb fragment was gel purified and subse- 
quently dephosphorylated using HK phosphatase (Epicentre) ♦ 
The linker oligonucleotides were annealed to form the 

25 linker fragment described above, which was then kinased 
using T4 Polynucleotide Kinase. The kinased linker was 
ligated to the Narl-digested MA RF DNA in a 10:1 (linker: - 
RF) molar ratio. After 18 hrs at 16 °C, the ligation was 
stopped by incubation at 65 *C for 10 min and the ligation 

30 products were ethanol precipitated in the presence of 10 
Mg of yeast tRNA. The dried precipitate was dissolved in 
5 Mi of water and used to transform D1210 cells by 
electroporation. After 60 min of growth in SOC at 37 "C, 
transformed cells were plated onto LB plates supplemented 

35 with ampicillin (Ap, 200 itq/ml) . RF DNA prepared from 
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AP r isolates was subjected to restriction enzyme analysis. 
The DNA sequences of the linker insert and the immediately 
surrounding regions were confirmed by DNA sequencing. 
Phage strains containing the ITI Linker sequence inserted 
into the Narl site in gene III are called MA-IL. 

Phage MA-IL RF DNA was partially digested with Bal l 
and the ca. 8.2 kb linear fragment was gel purified. This 
fragment was digested with PstI and the large linear 
fragment was gel purified. The Bal l to Pst I fragment of 
ITI-D1 was isolated from pMGIA (a plasmid carrying the 
sequence shown in TRAB86) . pMGIA was digested to comple- 
tion with Bal l and the ca. 1.6 kb fragment was isolated by 
agarose gel electrophoresis and subsequent Geneclean 
(BiolOl, La Jolla, CA) purification. The purified Bal l 
fragment was digested to completion with Pst I and EcoRI 
and the resulting mixture of fragments was used in a 
ligation with the Bal l and Pst I cut MA-IL RF DNA described 
above. Ligation, transformation, and plating were as 
described above. After 18 hr. of growth on LB Ap plates 
at 37 °C, Ap r colonies were harvested with LB broth 
supplemented with Ap (200 /xg/ml) and the resulting cell 
suspension was grown for two hours at 37 °C. Cells were 
pelleted by centrifugation (10 min at 5000xg, 4*C). "The" 
supernatant fluid was transferred to sterile centrifuga- 
tion tubes and recentrifuged as above. The supernatant 
fluid from the second centrifugation step was retained as 
the phage stock POP1. 

PCR was used to demonstrate the presence of phage 
containing the complete ITI-Dl-III fusion gene. Upstream 
PCR primers, 1UP and 2UP, are located spanning nucleotides 
1470 to 1494 and 1593 to 1618 of the phage M13 DNA 
sequence, respectively. A downstream PCR primer 3DN spans 
nucleotides 1779 to 1804. Two ITI-Dl-specif ic primers, 
IAI-1 and IAI-2, are located spanning positions 789 to 810 
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and 894 to 914 , respectively, in the ITI light chain 
sequence of TRAB86. IAI-1 and IAI-2 are used as down- 
stream primers in PCR reactions with 1UP or 2UP. IAI-1 is 
entirely contained within the Bal l to Pst I region of the 
5 ITI-D1 sequence, while IAI-2 spans the Pst I site in the 
ITI-D1 sequence. When aliquots of POP1 phage were used as 
substrates for PCR, template-specific products of charac- 
teristic size were produced in reactions containing 1UP or 
2UP plus IAI-1 or IAI-2 primer pairs. No such products 
10 are obtained using MA-IL phage as template. No PCR 
products with sizes corresponding to complete ITI-D1-^ 
gene III templates were obtained using POP1 phage and the 
1UP or 2UP plus 3DN primer pairs. This last result 

*f| reflects the low abundance (<1%) of phage containing the 

ftl 15 complete ITI-D1 sequence in POP1. 

yj Preparative PCR was used to generate substrate 

Si amounts of the 330 bp PCR product of a reaction using the 

^ 1UP and IAI-2 primer pair to amplify the P0P1 template. 

The 330 bp PCR product was gel purified and then cut to 
D 20 completion with Bal l and Pst I. The 138 bp Bal l to Pst I 

J; fragment from ITI-D1 was isolated by agarose gel electro- 

M, i 

phoresis followed by Qiaex extraction (Qiagen, Studio 
C| City, CA) . MA-IL phage RF DNA was digested to completion 

H with Pst I. The ca. 8.2 kb linear fragment was gel 

25 purified and subsequently digested to completion with 
Bal l. The Bal l digest was extracted once with phenol :- 
chloroform (1:1), the aqueous phase was ethanol precipi- 
tated, and the pellet was dissolved in TE (pH8.0). An 
aliquot of this solution was used in a ligation reaction 

30 with the 138 bp Bal l to Pst I fragment as described above. 
The ethanol precipitated ligation products were used to 
transform XLl-Blue(TM) cells by electroporation and after 
1 hr growth in SOC at 37 °C, cells were plated on LB Ap 
plates. A phage population, P0P2, was prepared from Ap r 

35 colonies as described previously. 
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Phage stocks obtained from individual plaques 
produced on titration of P0P2 were tested by PGR for the 
presence of the complete ITI-D1-III gene fusion. PCR 
results indicate the entire fusion gene was present in 
5 seven of nine isolates tested. RF DNA from the seven 
isolates testing positive was subjected to restriction 
enzyme analysis. The complete sequence of the ITI-D1 
insertion into gene III was confirmed in four of the seven 
isolates by DNA sequence analysis. Phage isolates 
10 containing the ITI-D1-III fusion gene are called MA-ITI. 

3 . Expression and display of ITI-DI. 

m Expression of the ITI domain I-Gene III fusion 

7f$ protein and its display on the surface of phage were 

03 demonstrated by Western analysis and phage titer neutrali- 

15 zation experiments. 

Si For Western analysis, aliquots of PEG-purified phage 

2f preparations containing up to 4*10 10 infective particles 

were subjected to electrophoresis on a 12.5% SDS-urea- 
p polyacrylamide gel. Proteins were transferred to a sheet 

IP 20 of Immobilon-P transfer membrane (Millipore, Bedford, MA) 

"!;: by electro transfer. Western blots were developed using a 

Q rabbit anti-ITI serum (SALI87) which had previously been 

M> incubated with an E^. coli extract, followed by goat anti- 

rabbit IgG conjugated to horse radish peroxidase (#401315, 
25 Calbiochem, La Jolla, Ca) . An immunoreactive protein with 
an apparent size of ca. 65-69 kD is detected in prepara- 
tions of MA-ITI phage but not with preparations of the 
parental MA phage. The size of the immunoreactive 
protein is consistent with the expected size of the 
30 processed ITI-D1-III fusion protein ( ca. 67 kD, as 
previously observed for the BPTI-III fusion protein) . 

Rabbit anti-BPTI serum has been shown to block the 
ability of MK-BPTI phage to infect E-_ coli cells (Example 
II) . To test for a similar effect of rabbit anti-ITI 
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serum on the infectivity of MA-ITI phage, 10 jul aliquots 
of MA or MA-ITI phage were incubated in 100 /zl reactions 
containing 10 fxl aliquots of PBS, normal rabbit serum 
(NRS), or anti-ITI serum. After a three hour incubation 
5 at 37 °C, phage suspensions were titered to determine 
residual plaque-forming activity. These data are summar- 
ized in Table 211. Incubation of MA-ITI phage with rabbit 
anti-ITI serum reduces titers 10- to 100-fold, depending 
on initial phage titer. A much smaller decrease in phage 
10 titer (10 to 40%) is observed when MA-ITI phage are 
incubated with NRS. In contrast, the titer of the 
parental MA phage is unaffected by either NRS or anti-ITI 
serum. 

Taken together, the results of the Western analysis 
15 and the phage- titer neutralization experiments are 
consistent with the expression of an ITI-DI-III fusion 
protein in MA-ITI phage, but not in the parental MA phage, 
such that ITI-specific epitopes are present on the phage 
surface. The ITI-specific epitopes are located with 

20 respect to III such that antibody binding to these 
epitopes prevents phage from infecting E*. coli cells. 

4. Fractionation of MA-ITI phage bound to aaarose-immobil- 

ized protease beads. 

To test if phage displaying the ITI-DI-III fusion 
25 protein interact strongly with the proteases human 
neutrophil elastase (HNE) or cathepsin-G, aliquots of 
display phage were incubated with agarose-immobilized HNE 
or cathepsin-G beads (HNE beads or Cat-G beads, respec- 
tively) . The beads were washed and bound phage eluted by 
30 pH fractionation as described in Examples II and III. The 
procession in lowering pH during the elution was: pH 7.0, 
6.0, 5.5, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, and 2.0. Follow- 
ing elution and neutralization, the various input, wash, 
and pH elution fractions were titered. 
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The results of several fractionations are summarized 
in Table 212 (EpiNE-7 or MA-ITI phage bound to HNE beads) 
and Table 213 (EpiC-10 or MA-ITI phage bound to Cat-G 
beads) . For the two types of beads (HNE or Cat-G) , the pH 
5 elution profiles obtained using the control display phage 
(EpiNE-7 or EpiC-10, respectively) were similar to those 
seen previously (Examples II and III). About 0.3% of the 
EpiNE-7 display phage applied to the HNE beads were eluted 
during the fractionation procedure and the elution 
10 profile had a maximum for elution at about pH 4.0. A 
smaller fraction, 0.02%, of the EpiC-10 phage applied to 
the Cat-G beads were eluted and the elution profile 
displayed a maximum near pH 5.5. 

OH The MA-ITI phage show no evidence of great affinity 

15 for either HNE or cathepsin-G immobilized on agarose 

fK beads. The pH elution profiles for MA-ITI phage bound to 

S| HNE or Cat-G beads show essentially monotonic decreases in 

IS phage recovered with decreasing pH. Further, the total 
fractions of the phage applied to the beads that were 

01 20 recovered during the fractionation procedures were quite 

flj low: 0.002% from HNE beads and 0.003% from Cat-G beads. 

q Published values of for inhibition neutrophil 

elastase by the intact, large (M r -240,000) ITI protein 
range between 60 and 150 nM and values between 20 and 6000 

25 nM have been reported for the inhibition of Cathepsin G by 
ITI (SWAI88, ODOM90) . Our own measurements of pH fraction 
of display phage bound to HNE beads show that phage 
displaying proteins with low affinity (>/iM) for HNE are 
not bound by the beads while phage displaying proteins 

30 with greater affinity (nM) bind to the beads and are 
eluted at about pH 5. If the first Kunitz-type domain ot 
the ITI light chain is entirely responsible for the 
inhibitory activity of ITI against HNE, and if this domain 
is correctly displayed on the MA-ITI phage, then it 
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appears that the minimum affinity of an inhibitor for HNE 
that allows binding and fractionation of display phage on 
HNE beads is 50 to 100 nM. 

5. Alteration of the PI region of ITI-DI. 

If ITI-DI and EpiNE-7 assume the same configuration 
in solution as BPTI, then these two polypeptides have 
identical amino acid sequences in both the primary and 
secondary binding loops with the exception of four 
residues about the PI position. For ITI-DI the sequence 
for positions 15 to 20 is (position 15 in ITI-DI cor- 
responds to position 3 6 in the UTI sequence of GEBH86) : 
METIS, GLY16, MET17, THR18, SER19, ARG20. In EpiNE-7 the 
equivalent sequence is: VAL15 , ALA16, MET17, PHE18, PR019, 
ARG20. These two proteins appear to differ greatly in 
their affinities for HNE, To improve the affinity of ITI- 
DI for HNE, the EpiNE-7 sequence shown above was incorpor- 
ated into the ITI-DI sequence at positions 15 through 20. 

The EpiNE-7 sequence was incorporated into the ITI-DI 
sequence in MA-ITI by cassette mutagenesis . The mutagenic 
cassette consisted of two synthetic 51 base oligonucleo- 
tides (top and bottom stands) which were annealed to make 
double stranded DNA containing an Eaa I overhang at the 5 1 
end and a Sty I overhang at the 3 1 end. The DNA sequence 
between the Eacr I and Sty I overhangs is identical to the 
ITI-DI sequence between these sites except at four codonss 
the codon for position 15, AT (MET), was changed to GTC 
(VAL) , the codon for position 16, GGA (GLY) , was changed 
to GCT (ALA) , the codon for position 18, ACC (THR) was 
changed to TTC (PHE) , and the codon for position 19, AGC 
(SER) , was changed to CCA (PRO) . MA-ITI RF DNA was 
digested with Eacr I and Sty I. The large, linear fragment 
was gel purified and used in a ligation with the mutagenic 
cassette described above* Ligation products were used to 
transform XL1-Blue tm cells as described previously. Phage 
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stocks obtained from overnight cultures of Ap r transduc- 
tants were screened by. PGR for incorporation of the 
altered sequence and the changes in the codons for 
positions 15, 16, 18, and 19 were confirmed by DNA 
5 sequencing* Phage isolates containing the ITI-DI-III 
fusion gene with the EpiNE-7 changes around the PI 
position are called MA-ITI -E7. 

6, Fractionation of MA-ITI-E7 phage. 

To test if the changes at positions 15, 16, 18, and 
10 19 of the ITI-DI-III fusion protein influence binding of 
display phage to HNE beads, abbreviated pH elution 
profiles were measured. Aliquots of EpiNE-7, MA-ITI, and 
MA-ITI-E7 display phage were incubated with HNE beads for 
three hours at room temperature- The beads were washed 
15 and phage were eluted as described (Example III) , except 
that only three pH elutions were performed: pH 7.0, 3.5, 
and 2.0. The results of these elutions are shown in Table 
214. 

Binding and elution of the EpiNE-7 and MA-ITI display 
20 phage were found to be as previously described. The total 
fraction of input phages was high (0.4%) for EpiNE-7 
phage and low (0.001%) for MA-ITI phage. Further, the 
EpiNE-7 phage showed maximum phage elution in the pH 3.5 
fraction while the MA-ITI phage showed only a monotonic 
25 decrease in phage yields with decreasing pH, as seen 
above - 

The two strains of MA-ITI-E7 phage show increased 
levels of binding to HNE beads relative to MA-ITI phage. 
The total fraction of the input phage eluted from the 
30 beads is 10-fold greater for both MA-ITI-E7 phage strains 
than for MA-ITI phage (although still 40-fold lower that 
EpiNE-7 phage) . Further, the pH elution profiles of the 
MA-ITI-E7 phage strains show maximum elutions in the pH 
3.5 fractions, similar to EpiNE-7 phage. 
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To further define the binding properties of MA-ITI-E7 
phage, the extended pH fractionation procedure described 
previously was performed using phage bound to HNE beads. 
These data are summarized in Table 215. The pH elution 
5 profile of EpiNE-7 display phage is as previously describ- 
ed. In this more resolved, pH elution profile, MA-ITI-E7 
phage show a broad elution maximum centered around pH 5. 
Once again, the total fraction of MA-ITI-E7 phage obtained 
on pH elution from HNE beads was about 40-fold less than 
10 that obtained using EpiNE-7 display phage. 

The pH elution behavior of MA-ITI-E7 phage bound to 
HNE beads is qualitatively similar to that seen using 
BPTI[K15L]-III-MA phage. BPTI with the K15L mutation has 
an affinity for HNE of »3.*10~ 9 M. Assuming all else 

15 remains the same, the pH elution profile for MA-ITI-E7 
suggests that the affinity of the free ITI-DI-E7 domain 
for HNE might be in the nM range. If this is the case, 
the substitution of the EpiNE-7 sequence in place of the 
ITI-DI sequence around the PI region has produced a 20- to 

20 50-fold increase in affinity for HNE (assuming Kj[ = 60 to 
150 nM for the unaltered ITI-DI) . 

If EpiNE-7 and ITI-DI-E7 have the same solution 
structure, these proteins present the identical amino acid 
sequences to HNE over the interaction surface. Despite 
25 this similarity, EpiNE-7 exhibits a roughly 1000-fold 
greater affinity for HNE than does ITI-DI-E7. Again 
assuming similar structure, this observation highlights 
the importance of non-contacting secondary residues in 
modulating interaction strengths. 

30 Native ITI light chain is glycosylated at two 

positions, SER10 and ASN45 (GEBH86) . Removal of the 
glycosaminoglycan chains has been shown to decrease the 
affinity of the inhibitor for HNE about 5-fold (SELL8 7) . 
Another potentially important difference between EpiNE-7 
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and ITI-DI-E7 is that of net charge* The changes in BPTI 
that produce EpiNE-7 reduce the total charge on the 
molecule from +6 to 4-1. Sequence differences between 
EpiNE-7 and ITI-DI-E7 further reduce the charge on the 
5 latter to -1. Furthermore , the change in net charge 
between these two molecules arises from sequence differ- 
ences occurring in the central portions of the molecules. 
Position 26 is LYS in EpiNE-7 and is THR in ITI-DI-E7, 
while at position 31 these residues are GLN and GLU, 
10 respectively. These changes in sequence not only alter 
the net charge on the molecules but also position nega- 
tively charged residue close to the interaction surface in 
ITI-DI-E7. It may be that the occurrence of a negative 
charge at position 31 (which is not found in any other of 

RI 15 the HNE inhibitors described here) destabilized the 

*ff t inhibitor-protease interaction. 

EXAMPLE X 

s GENERATION OF A VARIEGATED ITI-DI POPULATION 

Q] 20 The following is a hypothetical example demonstating 

fy how to obtain a derivative of ITI having high affinity for 

Jl . HNE. 

N* The results of Example IX demonstrate that the nature 

of the protein sequence around the PI position in ITI-DI 
25 can significantly influence the strength of the interac- 
tion between ITI-DI and HNE. While incorporation of the 
EpiNE-7 sequence increases the affinity of ITI-DI for HNE, 
it is unlikely that this particular sequence is optimal 
for binding. 

30 We generate a large population of potential binding 

proteins having differing sequences in the PI region of 
ITI-DI using the oligonucleotide ITIMUT. ITIMUT is 
designed to incorporate variegation in ITI-DI at the six 
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positions about and including the PI residue: 13, 15, 16, 
17, 18, and 19. ITIMUT . is synthesized as one long (top 
strand) 73 base oligonucleotide and one shorter (24 base) 
bottom strand oligonucleotide • The top strand sequence 
5 extends from position 770 (G) to position 842 (G) in the 
sequence of TREB86. This sequence includes the codons for 
the positions of variegation as well as the recognition 
sequences for the flanking restriction enzymes Eag I (778 
to 783) and Sty I (829 to 834) . The bottom strand 
10 oligonucleotide comprises the complement of the sequence 
from positions 819 to 842. 

To generate the mutagenic cassette, the top and 
bottom strand oligonucleotides are annealed and the 
resulting duplex is completed in an extension reaction 

15 using DNA polymerase. Following digestion of the 73 bp 
dsDNA with Eag I and Sty I, the purified 51 bp mutagenic 
cassette is ligated with the large linear fragment 
obtained from a similar digestion of MA-ITI RF DNA. 
Ligation products are used to transform competent cells by 

20 electroporation and phage stocks produced from Ap r 
transductants are analyzed for the presence and nature of 
novel sequences as described previously. 

The variegation in the ITIMUT cassette is confined to 
the codons for the six positions in ITI-DI (13, 15, 16, 

25 17, 18, and 19), and employs three different nucleotide 
mixes: N, R, and S. For this mutagenesis, the composition 
of the N-mix is 36%A, 17%C, 23%G, and 24%T, and cor- 
responds to the N-mix composition in the optimized NNS 
codon described elsewhere. The R-mix composition is 50%A, 

30 50%G, and the S-mix composition is 50%C, 50%G. 

The codon for ITI-DI position 13 (CCC, PRO) is 
changed to SNG in ITIMUT. This codon encodes the eight 
residues PRO, VAL, GLU, ALA, GLY, LEU, GLN, and ARG. The 
encoded group includes the parental residue (PRO) as well 
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as the more commonly observed variants at the position, 
ARG and LEU (see Table -15), and also provides for the 
occurrence of acidic (GLU) , large polar (GLN) and nonpolar 
(VAL) , and small (ALA, GLY) residues. 

5 The codons for positions 15 and 17 (ATG, MET) are 

changed to the optimized NNS codon. All 20 natural amino 
acid residues and a translation termination are allowed. 

The codon for position 16 (CGA, GLY) is changed to 
RNS in ITIMUT. This codon encodes the twelve amino acids 

10 GLY, ALA, ASP, GLU, VAL, MET, ILE, THR, SER, ARG, ASN, and 
LYS. The encoded group includes the most commonly 
observed residues at this position , ALA and GLY, and 
provides for the occurrence of both positively (ARG, LYS) 
and negatively (GLU, ASP) charged amino acids. Large 

15 nonpolar residues are also included (ILE, MET, VAL) . 

Finally, at positions 18 and 19, the ITI-DI sequence 
is changed from ACOAGC (THR- SER) to NNT • NNT . The NNT 
codon encodes the fifteen amino acid residues PHE, SER, 
TYR, CYS, LEU, PRO, HIS, ARG, ILE, THR, ASN, VAL, ALA, 
20 ASP, and GLY. This group includes the parental residues 
and the further advantages of the NNT codon have been 
discussed elsewhere. 

The ITIMUT DNA sequence encodes a total of: 
8 * 20 * 12 * 20 * 15 * 15 = 8,640,000 
25 different protein sequences in a total of: 

2 25 = 33,554,422 
different DNA sequences. The total number of protein 
sequences encoded by ITIMUT is only 7.4-fold fewer than 
the total possible number of natural sequences obtained 
30 from variation at six positions (= 20 6 = 6.4*10 7 ). 
However, this degree of variation in protein sequence is 
obtained from a minimum of 1.07xl0 9 (NNS 6 « 2 30 ) DNA 
sequences, a 32-fold greater number than that comprising 
ITIMUT. Thus, ITIMUT is an efficient vehicle for the 
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generation of a large and diverse population of potential 
binding proteins. 

EXAMPLE XI 

DEVELOPMENT AND SELECTION OF BPTI MUTANTS FOR 

5 BINDING TO HORSE HEART MYOGLOBIN (HHMB) 

The following example is hypothetical and illustrates 
alternative embodiments of the invention not given in 
other examples. 

HHMb is chosen as a typical protein target; any other 
10 protein could be used. HHMb satisfies all of the criteria 
for a target: 1) it is large enough to be applied to an 
affinity matrix, 2) after attachment it is not reactive, 
and 3) after attachment there is sufficient unaltered 
surface to allow specific binding by PBDs. 

15 The essential information for HHMb is known : 1) HHMb 

is stable at least up to 70*0, between pH 4.4 and 9.3, 2) 
HHMb is stable up to 1.6 M Guanidinium CI, 3) the pi of 
HHMb is 7.0, 4) for HHMb, M r = 16,000, 5) HHMb requires 
haem, 6) HHMb has no proteolytic activity. 

20 In addition, the following information about HHMb 

and other myoglobins is available: 1) the sequence of 
HHMb is known, 2) the 3D structure of sperm whale myo- 
globin is known; HHMb has 19 amino acid differences and 
it is generally assumed that the 3D structures are almost 

25 identical, 3) HHMb has no enzymatic activity, 4) HHMb is 
not toxic. 

We set the specifications of an SBD as : 
1) T = 25°C; 2) pH = 8.0; 3) Acceptable solutes ( (A) for 
binding: i) phosphate, as buffer, 0 to 20 mM, and ii) 
30 KC1, 10 mM; (B) for column elution : i) phosphate, as 
buffer, 0 to 30 mM, ii) KC1, up to 5 M, and iii) Guani- 
dinium CI, up to 0.8 M.); 4) Acceptable % < 1.0«10~ 8 H. 

As stated in Sec. III.B, the residues to be varied 
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are picked, in part, through the use of interactive 
computer graphics to visualize the structures. In this 
example, all residue numbers refer to BPTI. We pick a 
set of residues that forms a surface such that all 
5 residues can contact one target molecule. Information 
that we refer to during the process of choosing residues 
to vary includes: 1) the 3D structure of BPTI, 2) solvent 
accessibility of each residue as computed by the method 
of Lee and Richards (LEEB71) , 3) a compilation of se- 
10 quences of other proteins homologous to BPTI, and 4) 
knowledge of the structural nature of different amino 
acid types. 

Q Tables 16 and 34 indicate which residues of BPTI: a) 

S have substantial surface exposure, and b) are known to 

/2f 15 tolerate other amino acids in other closely related 

hj proteins. We use interactive computer graphics to pick 

S| sets of eight to twenty residues that are exposed and 

2? variable and such that all members of one set can touch a 

J" molecule of the target material at one time. If BPTI has 

Q 20 a small amino acid at a given residue, that amino acid 

CP may not be able to contact the target simultaneously with 

all the other residues in the interaction set, but a 
q larger amino acid might well make contact. A charged 

M amino acid might affect binding without making direct 

25 contact. In such cases, the residue should be included 
in the interaction set, with a notation that larger 
residues might be useful. In a similar way, large amino 
acids near the geometric center of the interaction set 
may prevent residues on either side of the large central 
30 residue from making simultaneous contact. If a small 
amino acid, however, were substituted for the large amino 
acid, then the surface would become flatter and residues 
on either side could make simultaneous contact. Such a 
residue should be included in the interaction set with a 
35 notation that small amino acids may be useful. 
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Table 35 was prepared from standard model parts and 
shows the maximum span between and the tip of each type 
of side group. Cp is used because it is rigidly attached 
to the protein main-chain; rotation about the C a -C^ bond 
5 is the most important degree of freedom for determining 
the location of the side group. 

Table 34 indicates five surfaces that meet the given 
criteria. The first surface comprises the set of residues 
that actually contacts trypsin in the complex of trypsin 
10 with BPTI as reported in the Brookhaven Protein Data Bank 
entry "1TPA". This set is indicated by the number "1". 
The exposed surface of the residues in this set (taken 
Q from Table 16) totals 1148 A 2 . Although this is not 

41 strictly the area of contact between BPTI and trypsin, it 

^ 15 is approximately the same. 

W Other surfaces, numbered 2 to 5, were picked by 

^ first picking one exposed, variable residue and then 

f& picking neighboring residues until a surface was defined. 

$ The choice of sets of residues shown in Table 34 is in no 

y 20 way exhaustive or unique; other sets of variable, surface 

residues can be picked. Set #2 is shown in stereo view, 
yr! Figure 14, including the a carbons of BPTI, the disulfide 

P linkages, and the side groups of the set. We take the 

^ orientation of BPTI in Figure 14 as a standard orientation 

25 and hereinafter refer to K15 as being at the top of the 

molecule, while the carboxy and amino termini are at the 

bottom. 

Solvent accessibilities are useful, easily tabulated 
indicators of a residue's exposure. Solvent acces- 
30 sibilities must be used with some caution; small amino 
acids are under-represented and large amino acids over- 
represented. The user must consider what the solvent 
accessibility of a different amino acid would be when 
substituted into the structure of BPTI. 
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To create specific binding between a derivative of 
BPTI and HHMb, we will vary the residues in set #2. This 
set includes the twelve principal residues 17(H), 19(1), 
21 (Y), 27(A), 28(G), 29 (L) , 31 (Q), 32 (T) , 34 (V) , 48(A), 
5 49(E), and 52 (M) (Sec* III.B). None of the residues in 
set #2 is completely conserved in the sample of sequences 
reported in Table 34; thus we can vary them with a high 
probability of retaining the underlying structure. 
Independent substitution at each of these twelve residues 
10 of the amino acid types observed at that residue would 
produce approximately 4.4 *10 9 amino acid sequences and 
the same number of surfaces. 

BPTI is a very basic protein. This property has 

%B been used in isolating and purifying BPTI and its homo- 

15 logues so that the high frequency of arginine and lysine 

H residues may reflect bias in isolation and is not neces- 

01 sarily required by the structure. Indeed, SCI-III from 

N Bombyx mori contains seven more acidic than basic groups 

m (SASA84) . 

S 20 Residue 17 is highly variable and fully exposed and 

2j can contain R, K, A, Y, ' H, F, L, M, T, G, Y, P, or S. 

j§ All types of amino acids are seen: large, small, charged, 

O neutral, and hydrophobic. That no acidic groups are 

- observed may be due to bias in the sample. 

25 Residue 19 is also variable and fully exposed, 

containing P, R, I, S, K, Q, and L. 

Residue 21 is not very variable, containing F or Y 
in 31 of 33 cases and I and W in the remaining cases. 
The side group of Y21 fills the space between T32 and the 
30 main chain of residues 47 and 48. The OH at the tip of 
the Y side group projects into the solvent. Clearly one 
can vary the surface by substituting Y or F so that the 
surface is either hydrophobic or hydrophilic in that 
region. It is also possible that the other aromatic 
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amino acid ( viz. H) or the other hydrophobics (L, M, or 
V) might be tolerated. 

Residue 27 most often contains A, but S, K, L, and T 
are also observed. On structural grounds, this residue 
5 will probably tolerate any hydrophilic amino acid and 
perhaps any amino acid. 

Residue 28 is G in BPTI. This residue is in a turn, 
but is not in a conformation peculiar to glycine. Six 
other types of amino acids have been observed at this 

10 residue: K, N, Q, R, H, and N. Small side groups at this 
residue might not "contact HHMb simultaneously with 
residues 17 and 34. Large side groups could interact 
with HHMb at the same time as residues 17 and 34. 
Charged side groups at this residue could affect binding 

15 of HHMb on the surface defined by the other residues of 
the principal set. Any amino acid, except perhaps P, 
should be tolerated. 

Residue 29 is highly variable, most often containing 
L. This fully exposed position will probably tolerate 
20 almost any amino acid except, perhaps, P. 

Residues 31, 32, and 34 are highly variable, exposed, 
and in extended conformations; "any amino acid should be 
tolerated. 

Residues 48 and 49 are also highly variable and 
25 fully exposed, any amino acid should be tolerated. 

Residue 52 is in an a helix. Any amino acid, except 
perhaps P, might be tolerated. 

Now we consider possible variation of the secondary 
set (Sec. 13.1.2) of residues that are in the neighbor- 
30 hood of the principal set. Neighboring residues that 
might be varied at later stages include 9(P), 11 (T) , 
15(K), 16(A), 18(1), 20(R), 22(F), 24(N), 26(K), 35(Y), 
47 (S) , 50(D) , and 53 (R) . 
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Residue 9 is highly variable, extended, and exposed. 
Residue 9 and residues 48 and 49 are separated by a bulge 
caused by the ascending chain from residue 31 to 34. For 
residue 9 and residues 48 and 49 to contribute simul- 
5 taneously to binding, either the target must have a 
groove into which the chain from 31 to 34 can fit, or all 
three residues (9, 48, and 49) must have large amino 
acids that effectively reduce the radius of curvature of 
the BPTI derivative, 

10 Residue 11 is highly variable, extended, and exposed. 

Residue 11, like residue 9,- is slightly far from the 
surface defined by the principal - residues- and will 
contribute to binding in the same circumstances. 

Residue 15 is highly varied. The side group of 
15 residue 15 points away form the face defined by set #2. 
Changes of charge at residue 15 could affect binding on 
the surface defined by residue set #2. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in charge 
20 at this residue could affect binding on the face defined 
by set #2. 

Residue 18 is I in BPTI. - This- residue* is in an 
extended conformation and is exposed. Five other amino 
acids have been observed at this residue: M, F, L, V, and 
25 T. Only T is hydrophilic. The side group points directly 
away from the surface defined by residue set #2. Substi- 
tution of charged amino acids at this residue could 
affect binding at surface defined by residue set #2. 

Residue 20 is R in BPTI. This residue is in an 
30 extended conformation and is exposed. Four other amino 
acids have been observed at this residue: A, S, L, and Q. 
The side group points directly away from the surface 
defined by residue set #2. Alteration of the charge at 
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this residue could affect binding at surface defined by 
residue set #2. 

Residue 22 is only slightly varied, being Y, F, or H 
in 30 of 33 cases. Nevertheless, A, N, and S have been 
observed at this residue. Amino acids such as L, M, I, 
or Q could be tried here. Alterations at residue 22 may 
affect the mobility of residue 21; changes in charge at 
residue 22 could affect binding at the surface defined by 
residue set #2. 

Residue 24 shows some variation, but probably can 
not interact with one molecule of the target simul- 
taneously with all the residues in the principal "set. 
Variation in charge at this residue might have an effect 
on binding at the surface defined by the principal set. 

Residue 26 is highly varied and exposed. Changes in 

charge may affect binding at the surface defined by 

residue set #2; substitutions may affect the mobility of 
residue 27 that is in the principal set. 

Residue 35 is most often Y, W has been observed. 
The side group of 35 is buried, but substitution of F or 
W could affect the mobility of residue 34. 

Residue 47 is always T or S in the sequence sample 
used. The Og amma probably accepts a hydrogen bond from 
the NH of residue 50 in the alpha helix. Nevertheless, 
there is no overwhelming steric reason to preclude other 
amino acid types at this residue. In particular, other 
amino acids the side groups of which can accept hydrogen 
bonds, viz . N, D, Q, and E, may be acceptable here. 

Residue 50 is often an acidic amino acid, but other 
amino acids are possible. 

Residue 53 is often R, but other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set #2. 
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Stereo Figure 14 shows the residues in set #2, plus 
R39. From Figure 14, one can see that R39 is on the 
opposite side of BPTI form the surface defined by the 
residues in set #2. Therefore, variation at residue 39 
5 at the same time as variation of some residues in set #2 
is much less likely to improve binding that occurs along 
surface #2 than is variation of the other residues in set 
#2. 

In addition to the twelve principal residues " and 13 

10 secondary residues, there are two other residues, 30(C) 
and 33(F), involved in surface #2 that we will probably 
not vary, at least hot until late in the procedure. 
These residues have their side groups buried inside BPTI 
and are conserved. Changing these residues does not 

15 change the surface nearly so much as does changing 
residues in the principal set. These buried, conserved 
residues do, however, contribute to the surface area of 
surface #2. The surface of residue set #2 is comparable 
to the area of the trypsin-binding surface. Principal 

20 residues 17, 19, 21, 27, 28, 29, 31, 32, 34, 48, 49, and 
52 have a combined solvent-accessible area of 946.9 A 2 . 
Secondary residues 9, 11, 15, 16, 18, 20, 22, 24, 26, 35, 
47, 50, and 53 have combined surface of -1041.7 A 2 * - 
Residues 30 and 33 have exposed surface totaling 38.2 A 2 . 

25 Thus the three groups' combined surface is 2026.8 A 2 . 

Residue 30 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, that 
C14/C38 is conserved in all natural sequences, yet Marks 
et al. (MARK87) showed that changing both C14 and C38 to 
30 A, A or T,T yields a functional trypsin inhibitor. Thus 
it is possible that BPTI-like molecules will fold if C30 
is replaced. 

Residue 3 3 is F in BPTI and in all homologous 
sequences. Visual inspection of the BPTI structure 
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suggests that substitution of Y, M, H, or L might be 
tolerated. 

Having identified twenty residues that define a 
possible binding surface, we must choose some to vary 
5 first. Assuming a hypothetical affinity separation 
sensitivity, C sensi , of 1 in 4 •lO 8 , we decide to vary six 
residues (leaving some margin for error in the actual 
base composition of variegated bases) . To obtain maximal 
recognition, we choose residues from the principal set 

10 that are as far apart as possible. Table 36 shows the 
distances between the jS carbons of residues-- in the 
principal and peripheral set. R17 and V34 are at one end 
of the principal surface. Residues A27, G28, L29, A48, 
E49, and M52 are at the other end, about twenty Angstroms 

15 away; of these, we will vary residues 17, 27, 29, 34, and 
48 ♦ Residues 28, 49, and 52 will be varied at later 
rounds . 

Of the remaining principal residues, 21 is left to 
later variations. Among residues 19, 31, and 32, we 
20 arbitrarily pick 19 to vary. 

Unlimited variation of six residues produces 6.4»10 7 
amino acid sequences. By hypothesis, C sens i is 1 in 
4*10 8 . Table 37 shows the programmed variegation at the 
chosen residues. The parental sequence is present as 1 

25 part in 5.5*10 7 , but the least favored sequences are 
present at only 1 part in 4.2 *10 9 . Among single-amino- 
acid substitutions from the PPBD, the least favored is 
F17-I19-A27-L29-V34-A48 and has a calculated abundance of 
1 part in 1.6*10 8 . Using the optimal qfk codon, we can 

30 recover the parental sequence and all one-amino-acid 
substitutions to the PPBD if actual nt compositions come 
within 5% of programmed compositions. The number of 
transformants is M ntv = 1.0*10 9 (also by hypothesis), 
thus we will produce most of the programmed sequences. 
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The residue numbers of the preceding section are 
referred to mature BPTI . (R1-P2-. ♦ . -A58) . Table 25 has 
residue numbers referring to the pre-M13CP-BPTI protein; 
all mature BPTI sequence numbers have been increased by 
5 the length of the signal sequence, i.e. 23 • Thus in 
terms of the pre-OSP-PBD residue numbers, we wish to vary 
residues 40, 42, 50, 52, 57, and 71. A DNA subsequence 
containing all these codons is found between the (Apal/- 
Drall/PssI) sites at base 191 and the Sph I site at base 

10 309 of the osp-pbd gene. Among Apa l, Dra l. and PssI, 
Apal is preferred because it recognizes six bases without 
any ambiguity. Drall and Pss I, on the other hand, 
recognize six bases with two-fold ambiguity at two of the 
bases. The vgDNA will contain more Dral I and PssI 

15 recognition sites at the varied locations than it will 
contain Apa l recognition sites. The unwanted extraneous 
cutting of the vgDNA by Apa l and Sph I will eliminate a 
few sequences from our population. This is a minor 
problem, but by using the more specific enzyme ( Apa l) , we 

20 minimize the unwanted effects. The sequence shown in 
Table 37 illustrates an additional way in which gratui- 
tous restriction sites can be avoided in some cases. The 
osp-ipbd gene had the codon GGC for g51; because we are 
varying both residue 50 and 52, it is possible to obtain 

25 an Apa l site. If we change the glycine codon to GGT, the 
Apa l site can no longer arise. Apa l recognizes the DNA 
sequence (GGGCC/C) . 

Each piece of dsDNA to be synthesized needs six to 
eight bases added at either end to allow cutting with 

30 restriction enzymes and is shown in Table 37. The first 
synthetic base (before cutting with Apal and SphI) is 184 
and the last is 322. There are 142 bases to be syn- 
thesized. The center of the piece to the synthesized 
lies between Q54 and V57. The overlap can not include 

35 varied bases, so we choose bases 245 to 256 as the 
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overlap that is 12 bases long. Note that the codon for 
F56 has been changed to TTC to increase the GC content of 
the overlap. The amino acids that are being varied are 
marked as X with a plus over them. Codons 57 and 71 are 
5 synthesized on the sense (bottom) strand. The design 
calls for "qfk" in the antisense strand, so that the 
sense strand contains (from 5' to 3') a) equal part C and 
A ( i.e. the complement of k) , b) (0.40 T, 0.22 A, 0.22 C, 
and 0.16 G) ( i.e. the complement of f ) , and c) (0.26 T, 
10 0.26 A, 0.30 C, and 0.18 G) . 

Each residue that is encoded by lf qfk" has 21 possible 
outcomes, each of the amino acids plus stop. Table 12 
gives the distribution of amino acids encoded by "qfk", 
assuming 5% errors. The abundance of the parental 
15 sequence is the product of the abundances ofRxIxAx 
L x V x A. The abundance of the least-favored sequence 
is 1 in 4.2-10 9 . 

01ig#27 and olig#28 are annealed and extended with 
Klenow fragment and all four (nt)TPs. Both the ds 
20 synthetic DNA and RF pLG7 DNA are cut with both Apa l and 
Sphl. The cut DNA is purified and the appropriate pieces 
ligated (See Sec. 14.1) and used to transform competent 
PE383. (Sec. 14.2). In order to generate a sufficient 
number of transformants, V c is set to 5000 ml. 

25 1) culture E^_ coli in 5.0 1 of LB broth at 37 °C until 
cell density reaches 5-10 7 to 7«10 7 cells/ml, 

2) chill on ice for 65 minutes, centrifuge the cell 
suspension at 4000g for 5' minutes at 4°C, 

3) discard supernatant; resuspend the cells in 1667 ml 
30 of an ice-cold, sterile solution of 60 mM CaCl 2 , 

4) chill on ice for 15 minutes, and then centrifuge at 
4000g for 5 minutes at 4°C, 

5) discard supernatant; resuspend cells in 2 x 400 ml of 
ice-cold, sterile 60 mM CaCl 2 ; store cells at 4°C 
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for 24 hours, 

6) add DNA in ligation, or TE buffer; mix and store on 
ice for 30 minutes; 20 ml of solution containing 5 
jLtg/ml of DNA is used, 

7) heat shock cells at 42 °C for 90 seconds, 

8) add 200 ml LB broth and incubate at 37 °C for 1 hour, 

9) add the culture to 2.0 1 of LB broth containing 
ampicillin at 35-100 jug/ml and culture for 2 hours at 
37°C, 

10) centrifuge at 8000 g for 20 minutes at 4°C, 

11) discard supernatant, resuspend cells in 50 ml of LB 
broth plus ampicillin and incubate 1 hour at 37 °C, 

12) plate cells on LB agar containing ampicillin, 

13) harvest virions by method of Salivar et al. (SALI64) . 

The heat shock of step (7) can be done by dividing the 
200 ml into 100 200 pi aliquots in 1*5 ml plastic Eppen- 
dorf tubes. It is possible to optimize the heat shock for 
other volumes and kinds of container. It is important to: 
a) use all or nearly all the vgDNA synthesized in liga- 
tion, this will require large amounts of pLG7 backbone, b) 
use all or nearly all the ligation mixture to transform 
cells, and c) culture all or nearly all the transformants 
at high density. These measures are directed at maintain- 
ing diversity. 

IPTG is added to the growth medium at 2.0 mM (the 
optimal level) and virions are harvested in the usual way. 
It is important to collect virions in a way that samples 
all or nearly all the transformants. Because F" cells are 
used in the transformation, multiple infections do not 
pose a problem. 

HHMb has a pi of 7.0 and we carry out chromatography 
at pH 8.0 so that HHMb is slightly negative while BPTI and 
most of its mutants are positive. HHMb is fixed (Sec. 
V.F) to a 2.0 ml column on Affi-Gel 10 (™) or Affi-Gel 
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15 (™) at 4.0 mg/ml support matrix, the same density that 
is optimal for a column supporting trp. 

We note that charge repulsion between BPTI and HHMb 
should not be a serious problem and does not impose any 
constraints on ions or solutes allowed as eluants. 
Neither BPTI nor HHMb have special requirements that 
constrain choice of eluants. The eluant of choice is KCl 
in varying concentrations* 

To remove variants of BPTI with strong, indis- 
criminate binding for any protein or for the support 
matrix, we pass the variegated population of virions over 
a column that supports bovine serum albumin (BSA) before 
loading the population onto the {HHMb} column. Affi- 
Gel 10 (™) or Affi-Gel 15 (™) is used to immobilize BSA at 
the highest level the matrix will support. A 10.0 ml 
column is loaded with 5.0 ml of Af f i-Gel-linked-BSA; this 
column, called {BSA}, has V v = 5.0 ml. The variegated 
population of virions containing 10 12 pfu in 1 ml (0.2 x 
Vy) of 10 mM KCl, 1 mM phosphate, pH 8.0 buffer is applied 
to {BSA} . We wash {BSA} with 4.5 ml (0..9 x V v ) of 50 mM 
KCl, 1 mM phosphate, pH 8.0 buffer. The wash with 50 mM 
salt will elute virions that adhere slightly to BSA but 
not virions with strong binding. The pooled effluent of 
the {BSA} column is 5.5 ml of approximately 13 mM KCl. 

The column {HHMb} is first blocked by treatment with 
10 11 virions of M13(am429) in 100 ul of 10 mM KCl buffered 
to pH 8.0 with phosphate; the column is washed with the 
same buffer until OD 2 60 returns to base line or 2 x Vy 
have passed through the column, whichever comes first. 
The pooled effluent from {BSA} is added to {HHMb} in 5.5 
ml *of 13 mM KCl, 1 mM phosphate, pH 8.0 buffer. The 
column is eluted in the following way: 

1) 10 mM KCl buffered to pH 8.0 with phosphate, until 
optical density at 280nm falls to base line or 2 x 
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Vy, whichever is first, (effluent discarded) , 
a gradient of 10 xnM.to 2 M KC1 in 3 x V v , pH held at 
8,0 with phosphate, (30*100 /il fractions), 
a gradient of 2 M to 5 M KC1 in 3 x Vy, phosphate 
buffer to pH 8.0 (30^100 jLtl fractions), 
constant 5 M KCl plus 0 to 0.8 M guanidinium CI in 
2 x V V/ with phosphate buffer to pH 8.0, (20*100 pi 
fractions) , and 

constant 5 M KCl plus 0.8 M guanidinium CI in 1 x Vy, 
with phosphate buffer to pH 8.0, (10*100 ill frac- 
tions) . 

In addition to the elution fractions, a sample is removed 
from the column and used as an inoculum for phage-sensi- 
tive Sup" cells (Sec. V) . A sample of 4 jiil from each 

15 fraction is plated on phage-sensitive Sup" cells. 
Fractions that yield too many colonies to count are 
replated at lower dilution. An approximate titre of each 
fraction is calculated. Starting with the last fraction 
and working toward the first fraction that was titered, we 

20 pool fractions until approximately 10 9 phage are in the 
pool, i.e. about 1 part in 1000 of the phage applied to 
the column. This population is infected into-3-10 11 
phage-sensitive PE384 in 300 ml of LB broth. The very low 
multiplicity of infection (moi) is chosen to reduce the 

25 possibility of multiple infection. After thirty minutes, 
viable phage have entered recipient cells but have not yet 
begun to produce new phage. Phage-born genes are expres- 
sed at this phase, and we can add ampicillin that will 
kill uninfected cells. These cells still carry F-pili and 

3 0 will absorb phage helping to prevent multiple infections. 

If multiple infection should pose a problem that 
cannot be solved by growth at low multiple-of-infection 
on F + cells, the following procedure can be employed to 
obviate the problem. Virions obtained from the affinity 
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separation are infected into F + coli and cultured to 
amplify the genetic messages (Sec. V) . CCC DNA is 
obtained either by harvesting RF DNA or by in vitro 
extension of primers annealed to ss phage DNA. The CCC 
5 DNA is used to transform F~ cells at a high ratio of cells 
to DNA. Individual virions obtained in this way should 
bear only proteins encoded by the DNA within. 

The phagemid population is grown and chromatographed 
three times and then examined for SBDs (Sec. V) . In each 

10 separation cycle, phage from the last three fractions 
that contain viable phage are pooled with phage obtained 
by removing some of the support matrix as an inoculum. At 
each cycle, about 10 12 phage are loaded onto the column 
and about 10 9 phage are cultured for the next separation 

15 cycle. After the third separation cycle, SBD colonies are 
picked from the last fraction that contained viable phage. 

Each of the SBDs is cultured and tested for retention 
on a Pep-Tie column supporting HHMb. The phage showing 
the greatest retention on the Pep-Tie {HHMb} column. This 

20 SBD! becomes the parental amino-acid sequence to the 
second variegation cycle. 

Assume for the sake of argument that, in SBD! , R40 
changed to D, 142 changed to Q, A50 changed to E, L52 
remained L, and A71 changed to W (see Table 38) . If so, a 

25 rational plan for the second round of variegation would be 
that which is set forth in Table 39. The residues to be 
varied are chosen by: a) choosing some of the residues in 
the principal set that were not varied in the first round 
( viz. residues 42, 44, 51, 54, 55, 72, or 75 of the 

30 fusion) , and b) choosing some residues in the secondary 
set. Residues 51, 54, 55, and 72 are varied through all 
twenty amino acids and, unavoidably, stop. Residue 44 is 
only varied between Y and F. Some residues in the 
secondary set are varied through a restricted range; 
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primarily to allow different charges (+, 0, -) to appear* 
Residue 38 is varied through K, R, E, or G. Residue 41 is 
varied through I, V, K, or E. Residue 43 is varied 
through R, S, G, N, K, D, E, T, or A. 

5 Now assume that in the most successful SBD of the 

second round of variegation (SBD-2!), residue 38 (K15 of 
BPTI) changed to E, 41 becomes V, 43 goes to N, 44 goes to 
F, 51 goes to F, 54 goes to S, 55 goes to A, and 72 goes 
to Q (see Table 40) . A third round of variation is 

10 illustrated in Table 41; eight amino acids are varied. 
Those in the principal set, residues 40, 55, and 57, are 
varied through all twenty amino acids. Residue 32 is 
varied through P, Q, T, K, A, or E. Residue 34 is varied 
through T, P, Q, K, A, or E. Residue 44 is varied through 

15 F, L, Y, C, W, or stop. Residue 50 is varied through E, 
K, or Q. Residue 52 is varied through L, F, I, M, or V* 
The result of this variation is -shown in Table 42. 

This example is hypothetical. It is anticipated 
that more variegation cycles will be needed to achieve 

20 dissociation constants of 10~ 8 M. It is also possible 
that more than three separation cycles will be needed in 
some variegation cycles. Real DNA chemistry and DNA 
synthesizers may have larger errors than our hypothetical 
5%. If S err > 0.05, then we may not be able to vary six 

25 residues at once. Variation of 5 residues at once is 
certainly possible. 

EXAMPLE XII 

DESIGN AND MUTAGENESIS OF A CLASS 1 MINI-PROTEIN 

To obtain a library of binding domains that are 
30 conformationally constrained by a single disulfide, we 
insert DNA coding for the following family of mini- 
proteins into the gene coding for a suitable OSP. 
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Xi-x 2 - c-x 3 -x 4 -x 5 -x 6 -c -x 7 -x 8 

5 Where 1 1 indicates disulfide bonding; this mini- 
protein is depicted in Figure 3. Disulfides normally do 
not form between cysteines that are consecutive on the 
polypeptide chain. One or more of the residues indicated 
above as X n will be varied extensively to obtain novel 
10 binding. There may be one or more amino acids that 
precede X! or follow X8, however, these additional 
residues will not be significantly constrained by the 
diagrammed disulfide bridge, and it is less advantageous 
to vary these remote, unbridged residues. The last X 
*«j 15 residue is connected to the OSP of the genetic package. 

03 X lf X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , and X 8 can be varied 

^ independently; i.e. a different scheme of variegation 

2 could be used at each position. X± and X 8 are the least 

\j constrained residues and may be varied less than other 

ty 20 positions. 

P X± and X 8 can be, for example, one of the amino acids 

3^ [E, K, T, and A] ; this set of amino acids is preferred 

because: a) the possibility of positively charged, 
ff- - negatively charged, and neutral amino acids is provided, 

M; 25 b) these amino acids can be provided in 1:1:1:1 ratio via 

the codon RMG (R = equimolar A and G, M = equimolar A and 

C) , and c) these amino acids allow proper processing by 

signal peptidases. 

One option for variegation of X 2 , X 3 , X 4 , X5, X 6 , and 
30 X7 is to vary all of these in the same way. For example, 
each of X 2 , X 3 , X 4 , X 5 , X 6 , and X 7 can be chosen from the 
set [F, S, Y, C, L, P, H, R, I, T, N, V, A, D, and G] 
which is encoded by the mixed codon NNT. Tables 10 and 
130 compares libraries in which six codons have been 
35 varied either by NNT or NNK codons. NNT encodes 15 
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different amino acids and only 16 DNA sequences* Thus, 
there are 1.139 • 10 7 amino-acid sequences, no stops, and 
only 1*678 • 10 7 DNA sequences, A library of 10 8 indepen- 
dent transformants will contain 99% of all possible 
5 sequences. The NNK library contains 6.4 • 10 7 sequences, 
but complete sampling requires a much larger number of 
independent trans f ormant s . 

EXAMPLE XIII 

A CYS: : HELIX: .TURN: : STRAND: :CYS UNIT 

10 The parental Class 2 mini-proteins may be a natural- 

ly-occurring Class 2 mini-protein. It may also be a 
domain of a larger protein whose structure satisfies or 
may be modified so as to satisfy the criteria of a class 2 
mini-protein. The modification may be a simple one, such 

15 as the introduction of a cysteine (or a pair of cysteines) 
into the base of a hairpin structure so that the hairpin 
may be closed off with a disulfide bond, or a more 
elaborate one, so as the modification of intermediate 
residues so as to achieve the hairpin structure. The 

2 0 parental class 2 mini-protein may also be a composite of 
structures from two or more naturally-occurring proteins, 
e.g. , an a helix of one protein and a 0 strand of a second 
protein. 

One mini-protein motif of potential use comprises a 
25 disulfide loop enclosing a helix, a turn, and a return 
strand. Such a structure could be designed or it could 
be obtained from a protein of known 3D structure. 
Scorpion neurotoxin, variant 3, (ALMA83a, ALMA83b) 
(hereafter ScorpTx) contains a structure diagrammed in 
30 Figure 15 that comprises a helix (residues N22 through 
N33), a turn (residues 33 through 35), and a return strand 
(residues 36 through 41) . ScorpTx contains disulfides 
that join residues 12-65, 16-41, 25-46, and 29-48. CYS 2 s 
and CYS 4 ^ are quite close and could be joined by a 



318 

disulfide without deranging the main chain. Figure 15 
shows CYS 25 joined to CYS 41 . In addition, CYS 29 has been 
changed to GLN. It is expected that a disulfide will form 
between 25 and 41 and that the helix shown will form; we 
5 know that the amino-acid sequence shown is highly com- 
patible with this structure. The presence of GLY 3 5, 
GLY35, and GLY39 give the turn and extended strand 
sufficient flexibility to accommodate any changes needed 
around CYS41 to form the disulfide. 

10 From examination of this structure (as found in entry 

1SN3 of the Brookhaven Protein Data Bank) , we see that 
the -following sets of residues would be preferred for 
variegation: 



SET 1 



Residue 


Codon 


Allowed amino acids 


Naa/Ndna 


1) ?27 


NNG 


L 2 R 2 MVS PTAQKEWG . 


13/15 


2) E 28 


VHG 


LMVPTAGKE 


9/9 


3) A 31 


VHG 


LMVPTAGKE 


9/9 


4) K 32 


VHG 


LMVPTAGKE 


9/9 


5) G24 


NNG 


L 2 R 2 MVS PTAQKEWG . 


13/15 


6) E23 


VHG 


LMVPTAGKE 


9/9 


7) Q34 


VAS 


HONKED 


6/6 



Noter Exponents on amino acids indicate multiplicity of 
codons • 



25 Positions 27, 28, 31, 32, 24, and 23 comprise one 

face of the helix. At each of these locations we have 
picked a variegating codon that a) includes the parental 
amino acid, b) includes a set of residues having a 
predominance of helix favoring residues, c) provides for a 

30 wide variety of amino acids, and d) leads to as even a 
distribution as possible. Position 34 is part of a turn. 
The side group of residue 34 could interact with molecules 
that contact the side groups of resideus 27, 28, 31, 32, 
24, and 23. Thus we allow variegation here and provide 

35 amino acids that are compatible with turns. The variega- 
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tion shown leads to 6.65*10 6 amino acid sequences encoded 
by 8.85»10 6 DNA sequences. 



SET 2 



Residue 


Codon 


Allowed amino acids 


Naa/Ndna 


1) *>26 


VHS 


L 2 IMV 2 P 2 T 2 A 2 HQNKDE 


13/18 


2) T 27 


NNG 


L 2 R 2 MVSPTAQKEWG. 


13/15 


3) K 30 


VHG 


KEQPTALMV 


9/9 


*) A 31 


VHG 


KEQPTALMV 


9/9 


5) K 32 


VHG 


LMVPTAGKE 


9/9 


6) S37 


RRT 


SNDG 


4/4 


7) Y 38 


NHT 


YSFHPLNTIDAV 


9/9 



10 

7) Y; 

^ Positions 26, 27, 30, 31, and 32 are variegated so as 

to enhance helix- favoring amino acids in the population. 
03 Residues 37 and 38 are in the return strand so that we 

15 pick different variegation codons. This variegation 
55 allows 4.43*10 6 amino-acid sequences and 7.08»10 6 DNA 

%| sequences. Thus a library that embodies this scheme can 

W be sampled very efficiently. 

n EXAMPLE XIV 

SI 20 DESIGN AND MUTAGENESIS OF CLASS 3 MINI-PROTEIN 

If Two Disulfide Bond Parental Mini-Proteins 

H Mini-proteins with two disulfide bonds may be 

modelled after the a-conotoxins , e.g. , GI, GIA, GII, MI, 
and SI. These have the following conserved structure: 



25 



30 



12 1' 2' 

(1-2 AAs)-C-C-(3 AAs)-C-(5 AAs)-C-(0-5 AAs) 



Hashimoto et ah (HASH85) reported synthesis of 
twenty-four analogues of a conotoxins GI, GII, and MI. 
Using the numbering scheme for GI (CYS at positions 2, 3, 
7, and 13), Hashimoto et al. reported alterations at 4, 8, 
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10, and 12 that allows the proteins to be toxic. Almquist 
et al. (ALMQ89) synthesized [des-GLI^] a Conotoxin GI and 
twenty analogues. They found that substituting GLY for 
PRO5 gave rise to two isomers, perhaps related to dif- 
5 ferent disulfide bonding. They found a number of substi- 
tutions at residues 8 through 11 that allowed the protein 
to be toxic. Zafaralla et al. (ZAFA88) found that 
substituting PRO at position 9 gives an active protein. 
Each of the groups cited used only in vivo toxicity as an 
10 assay for the activity. From such studies, one can infer 
that an active protein has the parental 3D structure, but 
one can not infer that an inactive protein lacks the 
parental 3D structure. 

Pardi et al. (PARD89) determined the 3D structure of 
15 a Conotoxin GI obtained from venom by NMR. Kobayashi et 
al. (KOBA89) have reported a 3D structure of synthetic a 
Conotoxin GI from NMR data which agrees with that of 
PARD89. We refer to Figure 5 of Pardi et al. . 

Residue GLU^ is known to accomodate GLU, ARG, and 
20 ILE in known analogues or homologues. A preferred 
variegation codon is NNG that allows the set of amino 
acids [L 2 R 2 MVSPTAQKEWG<stop>] . From Figure 5 of Pardi et 
al. we see 'that the" side group of GLU X projects into the 
same region as the strand comprising residues 9 through 
25 12. Residues 2 and 3 are cysteines and are not to be 
varied. The side group of residue 4 points away from 
residues 9 through 12; thus we defer varying this residue 
until a later round. PRO5 may be needed to cause the 
correct disulfides to form; when GLY was substituted here 
30 the peptide folded into two forms, neither of which is 
toxic. It is allowed to vary PR0 5 , but not perf erred in 
the first round. 

No substitutions at ALA 6 have been reported. A 
preferred variegation codon is RMG which gives rise to 
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ALA, THR, LYS, and GLU (small hydrophobic, small hydro- 
philic, positive, and negative) . CYS 7 is not varied. We 
prefer to leave GLY3 as is, although a homologous protein 
having ALAg is toxic * Homologous proteins having various 
5 amino acids at position 9 are toxic; thus, we use an NNT 
variegation codon which allows FS 2 YCLPHRITNVADG • We use 
NNT at positions 10, 11, and 12 as well. At position 14, 
following the fourth CYS, we allow ALA, THR, LYS, or GLU 
(via an RMG codon). This variegation allows 1,053»10 7 

10 anino-acid sequences, encoded by 1. 68 *10 7 DNA sequences. 
Libraries . having 2.0*10 7 , 3.0-10 7 , and 5.0*10 7 independent 
trans formants will, respectively, display «70%, «83%, and 
«95% of the allowed sequences. Other variegations are 
also appropriate. Concerning a conotoxins, see, inter 

15 alia , ALMQ89, CRU285, GRAY83, GRAY84, and PARD89. 

The parental mini -protein may instead be one of the 
proteins designated "Hybrid-I" and "Hybrid-II" by Pease et 
al. (PEAS90) ; cf . Figure 4 of PEAS90. One preferred set 
of residues to vary for either protein consists of: 





20 


Parental 
Amino acid 


Variegated 
Codon 


Allowed AA 
Amino acids 


seqs/ 
DNA seas 






A5 


RVT 


ADGTNS 


6/6 


£=355, 




P6 


- VYT 


PTALIV 


6/6 






E7 


RRS 


EDNKSRG 2 


7/8 




25 


T8 


VHG 


TPALMVQKE 


9/9 






A9 


VHG 


ATPLMVQKE 


9/9 






A10 


RMG 


AEKT 


4/4 






K12 


VHG 


KQETPALMV 


9/9 






016 


NNG 


L 2 R 2 S.WPOMTKVAEG 


13/15 



30 This provides 9.55-10 6 amino-acid sequences encoded by 
1*26*10 7 DNA sequences, A library comprising 5-0»10 7 
transformants allows expression of 98,2% of all possible 
sequences* At each position, the parental amino acid is 
allowed* 
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At position 5 we provide amino acids that are 
compatible with a turn. At position 6 we allow ILE and 
VAL because they have branched p carbons and make the "~ 
chain ridged* At position 7 we allow ASP, ASN, and SER 
5 that often appear at the amino termini of helices. At — 
positions 8 and 9 we allow several helix- favoring amino 
acids (ALA, LEU, MET, GLN, GLU, and LYS) that have ^ 
differing charges and hydrophobicities because these are 
part of the helix proper* Position 10 is further around 
10 the edge of the helix, so we allow a smaller set (ALA/ 
THR, LYS, and GLU) . This set not only includes 3 helix- 
favoring amino acids plus . THR that is well tolerated but 

^ also allows positive, negative, and neutral hydrophilic. 

The side groups of 12 and 16 project into the same region 

%Q 15 as the residues already recited. At these positions we 

allow a wide variety of amino acids with a bias toward 
helix-favoring amino acids, 

W The parental mini-protein may instead be a polypep- 

tide composed of residues 9-24 and 31-40 of aprotinin and 
Qi 20 possessing two disulfides (Cys9-Cys22 and Cysl4-Cys38) . 

Ill Such a polypeptide would have the same disulfide bond 

;f topology as a-conotoxin, and its two bridges would have 

2 spans of 12 and 17, respectively*- - 

Residues 23, 24 and 31 are variegated to encode the 
25 amino acid residue set [G,S,R,D,H / H,P,T,A] so that a 
sequence that favors a turn of the necessary geometry is 
found. We use trypsin or anhydro trypsin as the affinity 
molucule to enrich for GPs that display a mini-protein 
that folds into a stable structure similar to BPTI in the 
30 PI region. 

Three Disulfide Bond Parental Mini-Proteins 



The cone snails ( Conus ) produce venoms (conotoxins) 
which are 10-30 amino acids in length and exceptionally 
rich in disulfide bonds. They are therefore archetypal 
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mini-proteins . Novel mini-proteins with three disulfide 
bonds may be modelled after the m~(GIIIA, GIIIB, GIIIC) or 
n-(GVIA, GVIB, GVIC, GVIIA, GVIIB, MVIIA, MVIIB, etc. ) 
conotoxins. The ^-conotoxins have the following conserved 
structure : 



10 



1 2 



2'3< 



(2 AAs)-C-C-(5 AAs)-C-(4 AAs)-C-(4 AAs)-C-C-AA 



15 



20 



25 



30 



35 



40 



No 3D structure of "a /i-conotoxin has been published. 
Hidaka et al. (HIDA90) have established the connectivity 
of the disulfides. The following diagram depicts geo- 
graphutoxin I (also known as jz-conotoxin GIIIA) . 



Rl 



D2 



\ /K16 — P17 

C3::C15 \ 

\ Q18 

\ -R19 1 

C4: :C20- \ 



/ 



T5 



/ 



P6 



/ 
P7 



\ 



CIO: :C21 

L A22 



Q14 
I 

R13 



/ 



K8-K9 Kll- 



/ 



-D12 



The connection from R19 to C20 could go over or under the 
strand from Q14 to C15. One preferred form of variegation 
is to vary the residues in one loop. Because the longest 
loop contains only five amino acids, it is appropriate to 
also vary the residues connected to the cysteines that 
form the loop. For example, we might vary residues 5 
through 9 plus 2 f 11, 19, and 22. Another useful variega- 
tion would be to vary residues 11-14 and 16-19 , each 
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through eight amino acids- Concerning p conotoxins, see 
BECK89b, BECK89C, CRU289 f and HIDA90. 

The n-conotoxins may be represented as follows: 

1 2 3 1' 2» 3» 

5 C-(6 AAs)-C-(6 AAs) -C-C- (2-3 AAs)~C-(4-6 AAs)-C 



10 The King Kong peptide has the same disulfide arrangement 
as the n-conotoxins but a different biological activity. 
Woodward et al> (WOOD90) report the sequences of three 
homologuous proteins from SLl textile . Within the mature 
toxin domain, only the cysteines are conserved. The 

15 spacing of the cysteines is exactly conserved, but no 
other position has the same amino acid in all three 
sequences and only a few positions show even pair-wise 
matches. Thus we conclude that all positions (except the 
cysteines) may be substituted freely with a high probabil- 

20 ity that a stable disulfide structure will form. Concern- 
ing n conotoxins, see HILL89 and SUNX87 . 

Another mini-protein which may be used as a parental 
binding domain is the Cucurbita maxima trypsin inhibitor I 
(CMTI-I) ; CMTI-III is also appropriate* They are members 

25 of the squash family of serine protease inhibitors, which 
also includes inhibitors from summer squash, zucchini, and 
cucumbers (WIEC85) ♦ McWherter et al, (MCWH89) describe 
synthetic sequence-variants of the squash-seed protease 
inhibitors that have affinity for human leukocyte elastase 

30 and cathepsin G* Of course, any member of this family 
might be used* 

CMTI-I is one of the smallest proteins known, 
comprising only 29 amino acids held in a fixed comforma- 
tion by three disulfide bonds. The structure has been 
35 studied by. Bode and colleagues using both X-ray diffrac- 
tion (B0DE89) and NMR (HOLA89a,b) „ CMTI-I is of ellip- 
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soidal shape; it lacks helices or ^-sheets, but consists 
of turns and connecting short polypeptide stretches. The 
disulfide pairing is Cys3-Cys20, Cysl0-Cys22 and Cysl6 — 
Cys28. In the CMTI-I: trypsin complex studied by Bode et 
5 al. f 13 of the 29 inhibitor residues are in direct contact 
with trypsin; most of them are in the primary binding 
segment Val2 (P4) -Glu9 (P4 f ) which contains the reactive 
site bond Arg5 (PI) -Ile6 and is in a conformation observed 
also for other serine proteinase inhibitors • 

10 CMTI-I has a for trypsin of «1.5-10" 12 M. 

McWherter et al. suggested substitution of "moderately 
bulky hydrophobic groups" at PI to confer HLE specificity, 
y They found that a wider set of residues (VAL, ILE, LEU, 

JJ ALA, PHE, MET, and GLY) gave detectable binding to HLE. 

15 For cathepsin G, they expected bulky (especially aromatic) 
W side groups to be strongly preferred. They found that 

^! PHE, LEU, MET, and ALA were functional by their criteria; 

f2 * they did not test TRP, TYR, or HIS. (Note that ALA has 

s the second smallest side group available.) 

□ 

ff\ 20 A preferred initial variegation strategy would be to 

Ij vary some or all of the residues ARG^, VAL 2 , PR0 4 , ARG5, 

S ILE 6 , LEU7, METg , GLU 9 , LYS 1X , HIS 25 , GLY 26/ TYR 27 , and 

™? GLY 2 9* If the target were HNE, for example, -one could 

synthesize DNA embodying the following possibilities: 



25 


Parental 


vg 
Codon 


Allowed 
amino acids 


#AA seqs/ 
#DNA seas 




ARG X 


VNT 


RSLPHITNVADG 


12/12 




VAL 2 


NWT 


VILFYHND 


8/8 




PRO4 


VYT 


PLTIAV 


6/6 


30 


ARG5 


VNT 


RSLPHITNVADG 


12/12 




ILE6 


KNK 


all 20 


20/31 




LEUy 


VWG 


LQMKVE 


6/6 




TYR 27 


NAS 


YHONKDE • 


7/8 


35 


This allows 
about 1,03 


about 5,81 
•10 7 DNA 


•10 6 amino-acid 
sequences • A 


sequences encoded by 
library comprising 
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5.0-10 7 independent trans formants would give «99% of the 
possible sequences. Other variegation schemes could also 
be used. 

Other inhibitors of this family include: 
Trypsin inhibitor I from Citrullus vulgaris (OTLE87) , 
Trypsin inhibitor II from Bryonia dioica .(OTLE87), 
Trypsin inhibitor I from Cucurbita maxima (in 0TLE87) , 
trypsin inhibitor III from Cucurbita maxima (in OTLE87) , 
trypsin inhibitor IV from Cucurbita maxima (in 0TLE87) , 
trypsin inhibitor II from Cucurbita pepo (in OTLE87) , 
trypsin inhibitor III from Cucurbita pepo (in OTLE87) , 
trypsin inhibitor lib from Cucumis sativus (in OTLE87) , 
trypsin inhibitor IV from Cucumis sativus (in OTLE87) , 
trypsin inhibitor II from Ecballium elaterium (FAVE89) , 
and inhibitor CM-1 from Momordica repens (in OTLE87) . 

Another mini-protein that may be used as an initial 
potential binding domain is the heat-stable enterotoxins 
derived from some enterotoxogenic coli, Citrobacter 

freundii , and other bacteria (GUAR89) . These mini- 
proteins are known to be secreted from E*. coli and are 
extremely stable, Works related to synthesis, cloning, 
expression and properties of these proteins, include: 
BHAT86, SEKI85, SHIM87, TAKA85, TAKE90, TH0M85a,b, YOSH85, 
DALL90, DWAR89, GARI87, GUZM89 , GUZM9 0 , HOUG84, KUB089, 
KUPE90, OKAM87, 0KAM88, and OKAM90. 

Another preferred IPBD is crambin or one of its 
homologues, the phoratoxins and ligatoxins (LEC087) • 
These proteins are secreted in plants. The 3D structure 
of crambin has been determined. NMR data on homologues 
indicate that the 3D structure is conserved. Residues 
thought to be on the surface of crambin, phoratoxin, or 
ligatoxin are preferred residues to vary. 
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EXAMPLE XV 



10 



15 



20 



25 



A MINI-PROTEIN HAVING A CROSS-LINK CONSISTING OF CtJ(II) , 

ONE CYSTEINE, TWO HISTIDINES, AND ONE METHIONINE • 

Sequences such as 
HlS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-Hls-ASN-GLY-CYS 
and 

CYS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-HIS 
are likely to combine with Cu(II) to form structures as 
shown in the diagram: 



Xaa7- 

/ 

Xaa6 



-Xaa8 
\ 

Xaa9 



Xaa7- 

/ 
Xaa6 



-Xaa8 
\ 

Xaa9 



Xaa5 XaalO 

\ / 
MET4 HIS11 

/ \ / \ 
/ \ / \ 

GLY3 Cu ASN12 

I / \ I 

ASN2-HIS1 CYS14—GLY13 



Xaa5 XaalO 

\ / 
MET 4 HIS11 

/ \ / \ 
/ \ / \ 

GLY3 Cu ASN12 

I / \ I 

ASN2-CYS1 HIS14-GLY13 



COO 



NH 2 



COO 



Other arrangements of HIS, MET, HIS, and CYS along the 
chain are also likely to form similar structures* The- 

30 amino acids ASN-GLY at positions 2 and 3 and at positions 
12 and 13 give the amino acids that carry the metal- 
binding ligands enough flexibility for them to come 
together and bind the metal* Other connecting sequences 
may be used, e.q* GLY-ASN, SER-GLY, GLY-PRO, GLY-PRO-GLY, 

35 or PRO-GLY-ASN could be used. It is also possible to vary 
one or more residues in the loops that join the first and 
second or the third and fourth metal-binding residues. 
For example, 
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15 



Xaa8 Xaa9 

/ \ 

Xaa7 XaalO 

I I 

Xaa6 Xaall 

\ / 
MET 5 HIS12 



Xaa4 \ / \ 

I \ / \ 

10 PR03 CU ASN13 

\ / \ I 

GLY2-HIS1 CYS15— GLY14 



NH 2 COO 



is likely to form the diagrammed structure for a wide 
variety of amino acids at Xaa4. It is expected that the 
side groups of Xaa4 and Xaa6 will be close together and on 
20 the surface of the mini-protein. 

The variable amino acids are held so that they have 
limited flexibility. This cross-linkage has some differ- 
ences from the disulfide linkage. The separation between 
C a 4 and C a i2. is greater than the separation of the C a s of 

25 a cystine. In addition, the interaction of residues 1 
through 4 and 11 through 14 with the metal ion are 
expected to limit the motion of residues 5 through 10 
more than a disulfide between rsidues 4 and 11. A single 
disulfide bond exerts strong distance constrains on the a 

30 carbons of the joined residues , but very little direc- 
tional constraint on, for example, the vector from N to C 
in the main-chain. 

For the desired sequence, the side groups of residues 
5 through 10 can form specific interactions with the 
35 target. Other numbers of variable amino acids, for 
example, 4, 5, 7, or 3, are appropriate. Larger spans may 
be used when the enclosed sequence contains segments 
having a high potential to form ot helices or other 
secondary structure that limits the conformational freedom 
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of the polypeptide main chain. Whereas a mini-protein 
having four CYSs could form three distinct pairings, a 
mini-protein having two HISs, one MET, and one CYS can 
form only two distinct complexes with Cu. These two 
5 structures are related by mirror symmetry through the Cu. 
Because the two HISs are distinguishable, the structures 
are different* 

When such metal-containing mini-proteins are dis- 
played on filamentous phage, the cells that produce the 
10 phage can be grown in the presence of the appropriate 
metal ion, or the phage can be exposed to the metal only 
after they are separated from the cells. 

EXAMPLE XVI 

A MINI-PROTEIN HAVING A CROSS-LINK CONSISTING OF 2N{II) 

15 AND FOUR CYSTEINES 

A cross link similar to the one shown in Example XV 
is exemplified by the Zinc-finger proteins (GIBS88, 
GAUS87, PARR88, FRAN87, CHOW87, HARD90) . One family of 
Zinc-fingers has two CYS and two HIS residues in conserved 

20 positions that bind Zn ++ (PARR88 , FRAN87, CHOW87, EVAN88, 
BERG88, CHAV88) . Gibson et al. (GIBS88) review a number 
of sequences thought to form zinc-fingers and propose a 
three-dimensional model for these compounds* Most of 
these sequences have two CYS and two HIS residues in 

25 conserved positions, but some have three CYS and one HIS 
residue. Gauss et al. (GAUS87) also report a zinc-finger 
protein having three CYS and one HIS residues that bind 
zinc. Hard et al. (HARD90) report the 3D structure of a 
protein that comprises two zinc-fingers, each of which has 

30 four CYS residues. All of these zinc-binding proteins are 
stable in the reducing intracellular environment* 

One preferred example of a CYS:: zinc cross linked 
mini-protein comprises residues 440 to 461 of the sequence 
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shown in Figure 1 of HARD90. The resiudes 444 through 456 
may be variegated. One such variegation is as follows: 



Parental 


Allowed 






#AA / 


#DNA 


SER444 


SER, 


ALA 






2 / 


2 


ASP445 


ASP, 


ASN, 


GLU, 


LYS 


4 / 


4 


GLU446 


GLU, 


LYS, 


GLN 




3 / 


3 


ALA447 


ALA, 


THR, 


GLY, 


SER 


4 / 


4 


SER448 


SER, 


ALA 






2 / 


2 


GLY449 


GLY, 


SER, 


ASN, 


ASP ' 


4 / 


4 


CYS450 


CYS, 


PHE, 


ARG, 


LEU 


4 / 


4 


HIS451 


HIS, 


GLN, 


ASN, 


LYS, ASP, GLU 


6 / 


6 


TYR452 


TYR, 


PHE, 


HIS, 


LEU 


4 / 


4 


GLY453 


GLY, 


SER, 


ASN, 


ASP 


4 / 


4 


VAL454 


VAL, 


ALA, 


ASP, 


GLY, SER, ASN, 


THR, ILE 












8 / 


8 


LEU455 


LEU, 


HIS, 


ASP, 


VAL 


4 / 


4 


THR456 


THR, 


ILE, 


ASN, 


SER 


4 / 


4 


This leads 


to 3 . 


77*10 


7 DNA 


sequences that 


encode 


the same 



number of amino-acid sequences. A library having 1.0*10 



indepentent trans fonnants will display 93% of the allowed 
sequences; 2.0*10 8 independent transformants will display 
99.5% of allowed sequences. 
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Table 1: Single-letter codes. 



5 Single-letter code is used for pro teins : 

a = ALA c = CYS d = ASP e = GLU f = PHE 
g = GLY h = HIS i = ILE k = LYS 1 = LEU 
m = MET n = ASN p = PRO q = GLN r = ARG 
s = SER t = THR v = VAL W = TRP y = TYR 
10 . = STOP * = any amino acid 

b = n or d 
z « e or q 
x = any amino acid 



Single-letter TUB codes for DNA : 

20 







T, 


c, 


A, G stand 


for themselves 






M 


for 


A or C 








R 


for 


puRines A 


or G 


yj 


25 


W 


for 


A or T 








s 


for 


C or G 




\ i 




Y 


for 


pYrimidines 


T or C 






K 


for 


G or T 






- 30 


V 


for 


A, C, or G 


(not T) 






H 


for 


A, C, or T 


(not G) 






D 


for 


A, G, or T 


(not C) 


s?H ; 




B 


for 


C, G, or T 


(not A) 




35 


N 


for 


any base. 
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Table 2: Preferred Outer-Surface Proteins 



Genetic 
Package 



Preferred 
Outer-Surface 
Protein 



Reason for preference 



Ml 3 



coat protein 
(gpVIII) 



10 



15 



20 



25 



30 



35 



40 



45 



a) exposed amino terminus, 

b) predictable post- 
translational 
processing, 

c) numerous copies in 
virion. 

d) fusion data available 



gp III 



a) fusion data available* 

b) amino terminus exposed, 

c) working example 
available. 



PhiX174 G protein 



a) known to be on virion 
exterior, 

b) small enough that 
the G-ipbd gene can 
replace H gene. 



E. coli LamB 



a) fusion data available, 

b) non-essential. 



OmpC 



OmpA 



OmpF 



PhoE 



B. subtil is CotC 
spores 



a) topological model 

b) non-essential; abundant 

a) topological model 

b) non-essential; abundant 

c) homologues in other 
genera 

a) topological model 

b) non-essential; abundant 

a) topological model 

b) non-essential; abundant 

c) inducible 

a) no post-translational 
processing, 

b) distinctive sdequence 
that causes protein to 
localize in spore coat, 

c) non-essential, 



CotD 



Same as for CotC* 
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Table 3: Ambiguous DNA for AA_seq2 



fi s 



10 



15 



20 



25 



30 



35- 



SI 




1 
± 


*> 


A.T.G 


A.A.r 


a 


s 


9 


10 


G.C.n 


T.C.n 




A.G.y 


V 


P 


17 


18 


G.T.n 


C.C.n 



P 


d 


25 


26 


C.C.n 


G.A.y 



y 

33 
T.A.y 



l 

41 
A.T.h 



t 
34 
A.c.n 



i 

42 
A.T.h 



k 

3 

A. A. 



v 
11 
G.T.n 



m 
19 
A.T.G 



f 

27 

T.T.y 



g 

35 
G.G.n 



r 
43 
C.G.n 



s 


1 


V 


1 


4 


5 


6 


7 


T.C.n 


T.T.r 


G.T.n 


T.T.r 


A.G.y 


C.T.n 




C.T.n 


a 


v 


a 


t 


12 


13 


14 


15 


G.C.n 


G.T.n 


G.C.n 


A.C.n 


1 


s 


f 


a 


20 


21 


22 


23 


T.T.r 


T.C.n 


T.T.y 


G.C.n 


C.T.n 


A.G.y 








1 


e 


P 




29 


30 


31 


T.G.y 


T.T.r 


G.A.r 


C.C.n 




C.T.n 






P 


c 


k 


a 


36 


37 


38 


39 


C.C.n 


T.G.y 


A.A.r 


G.C.n 


y 


f 


y 


n 


44 


45 


46 


47 


T.A.y 


T.T.y 


T.A.y 


A.A.y 



k 
8 

A.A.r 



1 

16 
T.T.r 
C.T.n 



r 
24 
C.G.n 
A.G.r 

P 
32 
C.C.n 



r 
40 
C.G.n 
A.G.r 



a 
48 
G.C.n 



40 



45 



k 


a 


49 


50 


A.A.r 


G.C.n 



V 

57 
G.T.n 



y 

58 
T.A.y 



g 

51 
G.G.n 



g 

59 
G.G.n 



1 

52 
T.T.r 
C.T.n 



g 

60 
G.G.n 



c 
53 
T.G.y 



c 
61 
T.G.y 



q 


t 


f 


54 


55 


56 


C.A.r 


A.C.n 


T.T.y 


r 


a 


k 


62 


63 


64 


C.G.n 


G.C.n 


A.A.r 


A.G.r 
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Table 3, continued. 



5 


r 
65 
C.G.n 
A.G.r 


n 
66 
A.A.y 


n 
67 
A.A.y 


f 

68 
T.T.y 


k 
69 
A.A.r 


s 

70 
T.C.n 
A.G.y 


a 
71 
G.C.n 


10 


d 

77 

G. A.y 


c 

7A 

T.G.y 


m 

/ Z> 

A.T.G 


r 

7fi 

C.G.n 


t 

77 

A.C.n 


c 

7ft 

T.G.y 


g 

7Q 

G.G.n 


15 


a 

81 
G.C.n 


a 
82 
G.C.n 


e 
83 
G.A.r 


g 

84 
G.G.n 


a 

85 
G. A.y 


d 
86 
G. A.y 


p 

87 
C. C.n 


20 


k 

89 
A.A.r 


a 

90 
G.C.n 


a 
91 
G.C.n 


f 
92 
T.T.y 


N 
93 
A.A.y 


s 

94 
T.C.n 

a tz v 

rl • vj • y 


1 

95 
T.T.r 

p rn n 


25 
30 


a 
97 


s 

98 

m v* 

I . u ♦ n 
A.G.y 


a 
99 

f t*» 


t 
100 
A. v_ • n 


e 
101 


y 

102 
T . A. y 


i 
103 
a • i • n 


35 


y 

105 
T. A.y 


a 
106 
G.C.n 


w 
107 
T.G.G 


a 
108 
G.C.n 


m 
109 
A.T.G 


V 

110 
G.T.n 


V 

111 

G.T.n 


40 


i 
113 
A.T.h 


V 

114 
G.T.n 


g 

115 
G.G.n 


a 
116 
G.C.n 


t 
117 
A.C.n 


i 
118 
A.T.h 


g 

119 
G.G.n 


45 


k 
121 
A.A.r 


1 

122 
T.T.r 
C.T.n 


f 
123 
T.T.y 


k 
124 
A.A.r 


k 
125 
A.A.r 


f 
126 
T.T.y 


t 
127 
A.C.n 



e 
72 
G.A.r 



g 

80 
G.G.n 



a 
88 
G.C.n 



q 

96 
C.A.r 



g 

104 
G.G.n 



v 
112 
G.T.n 



l 
120 
A.T.h 



s 
128 
T.C.n 
A.G.y 





k 


a 


s 


« 






50 


129 


130 


131 


132 


133 


134 




A.A.r 


G.C.n 


T.C.n 


T.A.r 


T.A.r 


T.A.r 








A.G.y 


T.G.A 


T.G.A 


T.G.A 
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Table 4: Table of Restriction Enzyme Suppliers 



Suppliers : 

Sigma Chemical Co. 

P.O.Box 14508 

St. Louis, Mo. 63178 

Bethesda Research Laboratories 
P.O.Box 6009 

Gaithersburg, Maryland, 20877 

Boehringer Mannheim Biochemicals 
7941 Castleway Drive 
Indianapolis, Indiana, 46250 

International Biochemicals, Inc. 
P.O.Box 9558 

New Haven, Connect icutt, 06535 

New England BioLabs 
32 Tozer Road 

Beverly, Massachusetts, 01915 
Promega 

2800 S. Fish Hatchery Road 
Madison, Wisconsin, 53711 

Stratagene Cloning Systems 
11099 North Torrey Pines Road 
La Jolla, California, 92037 



Table 5: Potential sites in ipbd gene. 



Summary of cuts. 

Enz = %Acc I has 3 elective sites : 96 169 281 

Enz = Afl II has 1 elective sites : 19 

Enz - Apa I has 2 elective sites : 102 103 

Enz = Asu II has 1 elective sites : 381 

Enz - Ava III has 1 elective sites : 314 

Enz = BspM II has 1 elective sites : 72 

Enz « BssH II has 2 elective sites : 67 115 

Enz = % BstX I has 1 elective sites : 323 

Enz = -f Dra II has 3 elective sites : 102 103 226 

Enz - +EcoN I has 2 elective sites : 62 94 

Enz = + Esp I has 2 elective sites : 57 187 

Enz = Hind III has 6 elective sites : 9 23 60 



Enz = Kpn I has 1 elective sites : 48 
~Enz = Mlu I has 1 elective sites : 314 
Enz = Nar I has 2 elective sites ; 238 343 
Enz = Nco I has 1 elective sites : 323 
Enz = Nhe I has 3 elective sites : 25 289 388 
Enz = Nru I has 2 elective sites : 38 65 
Enz = 4- PflM I has 1 elective sites : 94 
Enz - PmaC I has 1 elective sites : 228 
Enz = + ?puM I has 2 elective sites : 102 226 
Enz = -fRsr II has 1 elective sites : 102 
Enz = +Sfi I has 2 elective sites : 24 2 61 
Enz = Spe I has 3 elective sites : 12 45 379 
Enz = Sph I has 1 elective sites : 221 
Enz = Stu I has 5 elective sites : 23 70 150 

287 386 

Enz = % Stv I has 6 elective sites : 11 44 

143 263 323 383 
-Enz - Xba I has 1 elective sites : 84 
Enz - Xho I has 1 elective sites : 85 
Enz - Xma III has 3 elective sites : 70 209 



287 361 386 
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Enzymes not cutting ipbd . 



Ayr II 
EcoR I 
Sac I 



BamH I 
EcoR V 
Sal I 



Bel I 
Hpa I 
Sau I 



BstE II 
Not I 
Sma I 



Xma I 
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Table 6: Exposure of amino acid types in T4 lzm & HEWL. 



10 



15 



HEADER HYDROLASE (O-GLYCOSYL) 18-AUG-86 2LZM 
COMPND LYSOZYME (E.C.3.2.1.17) 
AUTHOR L . H . WEAVER , B . W . MATTHEWS 

Coordinates from Brookhaven Protein Data Bank: 1LYM. 
Only Molecule A was considered. 

HEADER HYDROLASE (0-GLYCOSYL) 29-JUL-82 1LYM 

COMPND LYSOZYME (E.G. 3. 2. 1. 17) 

AUTHOR J . HOGLE ,S.T. RAO , M . SUNDARALINGAM 

Solvent radius = 1.40 Atomic radii in Table 7. 

Surface area measured in A 2 . 



20 



25 



30 



.35 _ 



40 



45 



Type 



Max 



N <area> 



sigma max 



mxn 



exposed ( fraction) 



ALA 


27 


211. 


0 


1 


.47 


214.3 


207.1 


85 


•1( 


0. 


40) 


CYS 


10 


239. 


8 


3 


.56 


245.5 


234.4 


38 


.3 ( 


0. 


16) 


ASP 


17 


271. 


1 


5 


.36 


281.4 


262.5 


127 


•1( 


0. 


47) 


GLU 


10 


297. 


2 


5 


.78 


304.9 


285.4 


100 


•7( 


0. 


34) 


PHE 


8 


316. 


6 


5 


.92 


325.4 


307.5 


99 


• 8( 


0. 


32) 


GLY 


23 


185. 


5 


1 


.31 


188.3 


183.3 


91 


• 9( 


0. 


50) 


HIS 


2 


297. 


7 


3 


.23 


301.0 


294.5 


32 


•9( 


0. 


11) 


ILE 


16 


278. 


1 


3 


.61 


285.6 


269.6 


57 


.5( 


0. 


21) 


LYS 


19 


309. 


2 


5 


.38 


321.9 


300.1 


147 


•1( 


0. 


48) 


LEU 


24 


282. 


6 


6 


.75 


304.0 


269.8 


109 


.9( 


0. 


39) 


MET 


_7 _ 


293. 


0 


5 


.70 


299.5 


283.1 


88 


.2( 


0. 


30) 


ASN 


26 


273. 


0 


5 


.75 


285.1 


262.6 


143 


•4( 


0. 


53) 


PRO 


5 


239. 


9 


2 


.75 


242.1 


234.6 


128 


•7( 


0. 


54) 


GLN 


8 


299. 


5 


4 


.75 


305.8 


291.5 


145 


•9( 


0. 


49) 


ARG 


24 


344, 


7 


8 


.66 


355.8 


326.7 


240 


•7( 


0. 


70) 


SER 


16 


228. 


6 


3 


.59 


236.6 


223.3 


98 


.2( 


0. 


43) 


THR 


18 


250.3 


3 


.89 


257.2 


244.2 


139 


.9 ( 


0. 


56) 


VAL 


15 


254. 


3 


4 


.05 


261.8 


245.7 


111 


.1( 


0. 


44) 


TRP 


9 


359. 


4 


3 


.38 


366.4 


355.1 


102 


.0( 


0. 


28) 


TYR 


9 


335. 


8 


4 


.97 


342.0 


325.0 


72 


,6( 


0. 


22) 
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Table 7: Atomic radii 
A 



°carbonyl 

N amide 
Other atoms 



1.70 
1.52 
1.55 
1.80 



Table 8 

Fraction of DNA molecules having 
5 n non-parental bases when 

" reagents that have fraction 
M of parental nucleotode. 





10 


M 


.9965 


.97716 


.92612 


.8577 


.79433 


.63096 


fO 


.9000 


.5000 


.1000 


.0100 


.0010 


.000001 






fl 


.09499 


.35061 


.2393 


.04977 


.00777 


.0000175 






f2 


.00485 


.1188 


.2768 


.1197 


.0292 


.000149 






f3 


.00016 


.0259 


.2061 


.1854 


.0705 


.000812 




15 


f4 . 


000004 


.00409 


.1110 


.2077 


.1232 


.003207 






f8 


0. 


2-10" 7 


.00096 


.0336 


.1182 


.080165 






fl6 


0. 


0. 


0. 


5-10" 7 


.00006 


.027281 




20 


f23 


0. 


0. 


0. 


0. 


0. 


.0000089 






- most -0 


-0- 


2 


5 


7 


12 



25 



"most" is the value of n having the highest 
probability. 
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Table 9: best vgCodon 



Program "Find Optimum vgCodon." 
INITIALIZE-MEMORY-OF-ABUNDANCES 
DO ( tl = 0.21 to 0.31 in steps of 0.01 ) 
. DO ( cl - 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al = 0.23 to 0.33 in steps of 0.01 ) 
Comment calculate gl from other concentrations 
. . . gl = 1.0 - tl - cl - al 
. . . IF( gl .ge. 0.15 ) 

. . . . DO ( a2 = 0.37 to 0.50 in steps of 0.01 ) 
DO ( c2 - 0.12 to 0.20 in steps of 0.01 ) 

Comment Force D+E = R + K 

g2 « (gl*a2 5*al*a2 )/ (cl+0 . 5*al) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

IF(g2.gt. O.l.and. t2.gt.0.1) 

CALCULATE -ABUNDANCES 

COMPARE-ABUNDANCES-TO-PREVIOUS-ONES 

end_IF_bloc3c 

end_D0_loop ! c2 

end_DO_JLoop ! a2 

. . . . . end_IF_block 1 if gl big enough 
. . . .end_DO_JLoop ! al 
. . . end__DO_loop i cl 
. .end_DO_loop 1 tl 

WRITE the best distribution and the abundances. 
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Table 10: Abundances obtained 
from various vgCodois 

A. Optimized fxS Codon, Restrained by [D]+[E] « [K]+[R] 







T C A 






1 




•26 .18 .26 


.30 


f 


2 




.22 .16 .40 


.22 


X 


3 




.5 .0 .0 


.5 


S 


Amino 




Amino 




acid 


Abundance 


acid 


Abundance 


A 




4.80% 


C 


2.86% 


D 




6.00% 


E 


6.00% 


F 




2.86% 


G 


6.60% 


H 




3.60% 


I 


2.86% 


K 




5.20% 


L 


6.82% 


M 




2.86% 


N 


5.20% 


P 




2.88% 


Q 


3.60% 


R 




6.82% 


S 


7.02% mfaa 


T 




4.16% 


V 


6.60% 


W 




2.86% lfaa 


Y 


5.20% 


StOO 




5.20% 






[D] + 


[E] m [K] + [R] - 


.12 




ratio 


= Abun(W)/Abun(S) 


- 0.4074 





1 fl/ratio^ 

1 2.454 

2 6.025 

3 14.788 

4 36.298 

5 89.095 

6 218.7 

7 536.8 



f ratio) 3 

.4074 

.1660 

.0676 

.0275 

.0112 
4.57*10~ 3 
1.86-10~ 3 



stop- free 
.9480 
.8987 
.8520 
.8077 
.7657 
.7258 
.6881 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 



5 B. Unrestrained, optimized 



10 





T 


C 


A 


G 


1 


.27 


.19 


.27 


.27 


2 


.21 


.15 


.43 


.21 


3 


.5 


.0 


.0 


.5 



Amino 







acid 


ADunaance 






TV 

A 






15 


u 


D • OX'S 






F 


2 . 84*6 






H 


4.08% 






K 


5.81% 






M 


2.84% 




20 


P 


2.85% 






R 


6.83% 






T 


4.05% 






W 


2.84% lfaa 






StOD 


5.81% 




25 










[D] + 


[E] = 0.1162 






ratio 


= Abun (W) /Abun ( 




30 










i 


fl/ratio) 3 






i 


2.4286 






2 


5.8981 




35 


3 


14.3241 






4 


34.7875 






5 


84.4849 






6 


205.180 






7 


498.3 



Amxno 
acid 



Abundance 



c 

E 
G 
I 
L 
N 

Q 

S_ 

V 

y 



2.84% 
5.81% 
5.67% 
2.84% 
6.83% 
5.81% 
4.08% 

6.89% mfaa 



5.67% 
5.81% 



[K] + [R] = 0.1264 



(ratio) 3 
.41176 
.16955 
.06981 
.02875 
.011836 
.004874 
2.007*10 



-3 



stop-free 
.9419 
.8872 
.8356 
.7871 
.74135 
.69828 
.6577 



40 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 



C. Optimized NNT 







T C 


A 


G 




1 




.2071 .2929 


.2071 


2929 




2 




# 2929 .2071 


.2929 


2071 




3 




1. .0 .0 


.0 






Amino 




Amino 






acid 


Abundance 


acid 


Abundance 


A 




6.06% 


C 


4. 29% 


Ifaa 


D 




8.58% 


E 


none 




F 




6.06% 


G 


6.06% 




H 




8.58% 


I 


6.06% 




K 




none 


L 


8.58% 




M 




none 


N 


6.06% 




P 




6.06% 


Q 


none 




R 




6.06% 


s 


8.58% 


mfaa 


T 




4.29% Ifaa 


V 


8.58% 




W 




none 


y 


6.06% 




stop 




none 








i 




ri/ratio) 3 


( ratio) j 




stop 


1 




2.0 


.5 




1. 


2 




4.0 


.25 




1. 


3 




8.0 


.125 




1. 


4 




16.0 


.0625 




1. 


5 




32.0 


.03125 




1. 


6 




64.0 


.015625 




1. 


7 




128.0 


.0078125 


1. 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 



D. Optimized NNG 





T 


C 


A 


G 


1 


.23 


.21 


.23 


.33 


2 


.215 


.285 


.285 


.215 


3 


.0 


.0 


.0 


1.0 



Amino 
acid 



Abundance 



Amino 
acid 



Abundance 



A 
D 
F 
H 
K 
M 
P 
R 
T 
W_ 
stop 



9.40% C 

none E 

none G 

none I 

6.60% L_ 

4.90% N 

6.00% Q 

9.50% S 

6.6 % V 

4.90% lfaa Y 



none 
9.40% 
7.10% 
none 

9.50% mfaa 



none 
6.00% 
.6.60% 
7.10% 
none 



6.60% 



1 fl/ratio) 3 

1 1.9388 

2 3.7588 

3 7.2876 

4 14.1289 

5 27.3929 

6 53.109 

7 102.96 



( ratio) 3 
.51579 
.26604 
.13722 
.07078 



3.65*10 
1.88*10 
9.72*10 



-2 
-2 
-3 



stop- free 
0.934 
0.8723 
0.8148 
0.7610 
0.7108 
0.6639 
0.6200 
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Table 10: Abundances obtained 
from optimum vgCodon 
(continued) 

5 

E. Unoptimized NNS (NNK gives identical distribution) 





T 


C 


A 


G 


10 1 


.25 


.25 


.25 


.25 


2 


.25 


.25 


.25 


.25 


3 


.0 


.5 


.0 


0,5 



15 

Amino Amino 

acid Abundance acid Abundance 

A 6.25% C 3.125% 

D 3.125% E 3.125% 

20 F 3.125% G 6.25% 

H 3.125% I 3.125% 

K 3.125% L 9.375% 

M 3.125% N 3.125% 

P 6.25% Q 3.125% 

25 R 9.375% S 9.375% 

T 6.25% V 6.25% 

W 3.125% Y 3.125% 



30 



Stop 3.125% 



i fl/ratio^ J (ratio) J stop- free 

1 3.0 .33333 .96875 

- 2 9.0 .11111 .9385 

35 3 27.0 .03704 .90915 

4 81.0 .01234567 .8807 

5 243.0 .0041152 .8532 

6 729.0 1.37*10~ 3 .82655 

7 2187.0 4.57«10~ 4 .8007 
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Table 11: Calculate worst codon. 

Program "Find worst vgCodon within Serr of given 

distribution. " 
INITI ALI Z E -MEMORY -OF-ABUNDANCES 
Comment Serr is % error level, 

READ Serr 

Comment Tli,Cli,Ali,Gli, T2i,C2i,A2i,G2i, T3i,G3i 
Comment are the intended nt-distribution. 

READ Tli, Cli, Ali, Gli 

READ T2i, C2i, A2i, G2i 

READ T3i, G3i 

Fdwn = 1, -Serr 

Fup = l.+Serr 

DO ( tl - Tli*Fdwn to Tli*Fup in 7 steps) 

• DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 

• . DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 

• • • gl « 1. - tl - cl - al 

. . . IF( (gl-Gli)/Gli .It, -Serr) 
Comment gl too far below Gli, push it back 

• • • • gl » Gli*Fdwn 

.... factor = (l.-gl)/(tl + cl + al) 
. . . . tl = tl*factor 
. . „ . cl = cl*factor 

• • • • al = al*factor 
end_IF_block 

. . . IF( (gl-Gli)/Gli .gt. Serr) 
Comment gl too far above Gli, push it back 
. • . . gl = Gli*Fup 

.... factor = (l.-gl)/(tl + cl + al) 
. . . . tl = tl* factor 
. . . . cl - cl*factor 
. . . . al = al*f actor 
end_IF_block 

, . . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps) 

• . . . DO ( c2 = C2i*Fdwn to C2i*Fup in 7 steps) 
DO (g2=G2i*Fdwn to G2i*Fup in 7 steps) 

Comment Calc t2 from other concentrations. 

t2 - 1. - a2 - c2 - g2 

IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2i, push it back 
t2 = T2i*Fdwn 

factor « (l.-t2)/(a2 + c2 + g2) 

a2 = a2*factor 

c2 - c2* factor 

g2 - g2*f actor 

end_IF_block 

IF( (t2-T2i)/T2i .gt. Serr) 

Comment t2 too far above T2i, push it back 
t2 = T2i*Fup 

factor = (l.-t2)/(a2 + c2 + g2) 
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Table 11, continued. 

a2 = a2*factor 

c2 = c2* factor 

. . g2 = g2*factor 

end_IF_bloc]c 

IF(g2.gt. 0.0 .and. t2.gt.0.0) 

t3 = 0.5*(1.-Serr) 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

# COMPARE-ABUNDANCES-TO-PREVIOUS-ONES 

t3 = 0.5 

# g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES -TO-PREVIOUS -ONES 

t3 = 0.5*(1.+Serr) 

g3 = 1. - t3 

CALCULATE-ABUNDANCES 

COMPARE-ABUNDANCES -TO-PREVIOUS -ONES 

end_IF_block 

end_DO_loop i g2 

endJDO_loop ! c2 

end_DO_loop ! a2 

. . . .end_DO_loop i al 
. . . end_DO_loop 1 cl 
. . end_DO_loop I tl 

WRITE the WORST distribution and the abundances. 



1; 
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Table 12 : Abundances obtained 
using optimum vgCodon assuming 
5% errors 



Amino Amino 



acid 


Abundance 


acid 


Abundance 


A 


4.59% 


c 


2.76% 


D 


5.45% 


E 


6.02% 


F 


2.49% lfaa 


G 


6.63% 


H 


3.59% 


I 


2.71% 


K 


5.73% 


L 


6.71% 


M 


3.00% 


N 


5.19% 


P 


3.02% 


Q 


3.97% 


R 


7.68% mfaa 


S 


7-01% 


T 


4.37% 


V 


6.00% 


W 


3.05% 


Y 


4.77% 


StOO 


5.27% 






ratio = 


Abun(F)/Abun(R) 


= 0.3248 





1 ( l/ratiol 3 f ratio) 3 stop-free 

1 3.079 .3248 .9473 

2 9.481 .1055 .8973 

3 29.193 .03425 .8500 

4 89.888 .01112 .8052 

5 276.78 3.61-10" 3 .7627 

6 852.22 1.17»10~ 3 .7225 
'7 2624.1 " 3.81*10" 4 .6844 
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Table 13: BPTI Homologues 



R # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
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Table 13, continued 

R # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

44 NNNNNNNNNNNRRRRNNRR 

45 FFFFFFFFFFFFFFFFFFF 

46 KKKEKKKKKKKKKKKEKDK 

47 SSSTSSSSSSSTTTTTTTT 

48 AAATAAAAAAAI I I I R K T I 

49 EEEEEEEEEEEEEDDDAQE 

50 DDDMDDDDDDDEEEEEEQE 

51 ccccccccccccccccccc 

52 MMMLMMMMMMERRRHRVQR 

53 RRRRRRRRRRRRRRRERGR 

54 TTTITTTTTTTTTTTTAVT 

55 CCCCCCCCCCCCCCCCCCC 
56- GG-GEGGGGGGGIVVVGRVV 

57 GGGPG_GGGGGGRGGGGP-G 

58 AA A PAAAAAAAK - - - KP - - 
D 59 ---Q------------E-- 

s0 60 ---Q------------R-- 

ff, 61 -__ T ------------p-- 

62 -_- D --------------- 

fl 63 ---k------ - - -- -- -- - 

m 64 --_s--------------- 

ff| 1 BPTI 

T 2 Engineered BPTI From MARK87 

^ ~ 3 Engineered BPTI From MARK87 

y 4 Bovine Colostrum (DUFT85) 

Zl 5 Bovine Serum (DUFT85) 

fU 6 Semisynthetic BPTI, TSCH87 

€1 7 Semisynthetic BPTI, TSCH87 

O 8 Semisynthetic BPTI, TSCH87 

^ 9 Semisynthetic BPTI, TSCH87 

r 10 Semisynthetic BPTI, TSCH87 

11 Engineered BPTI, AUER87 

12 Dendroaspis polylepis polylepis (Black mamba) venom I 
(DUFT85) 

13 Dendroaspis polvlepis polylepis (Black Mamba) venom K 
(DUFT85) 

14 Hemachatus hemachates (Ringhals Cobra) HHV II 
(DUFT85) 

15 Naia nivea (Cape cobra) NNV II (DUFT85) 

16 Vipera russelli (Russel's viper) RW II (TAKA74) 

17 Red sea turtle egg white (DUFT85) 

18 Snail mucus ( Helix pomania ) (WAGN78) 

19 Dendroaspis ancmsticeps (Eastern green mamba) 
C13 SI C3 toxin (DUFT85) 
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Table 13, Continued 
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Table 13, continued 
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20 Dendroaspis anousticeps (Eastern Green Mamba) 
C13 S2 C3 toxin (DUFT85) 

21 Dendroaspis polylepis polylepes , (Black mamba) B toxin 
(DUFT85) 

22 Dendroaspis polvlepis polylepes (Black Mamba) E toxin 
(DUFT85) 

23 Vipera ammodvtes TI toxin (DUFT85) 

24 Vipera ammodvtes CTI toxin (DUFT85) 

25 Bunqarus fasciatus VIII B toxin (DUFT85) 

26 Anemonia sulcata (sea anemone) 5 II (DUFT85) 

27 Homo sapiens HI-14 "inactive" domain (DUFT85) 

28 Homo sapiens HI-8- "active" domain (DUFT85) 

29 beta bungarotoxin Bl (DUFT85) 

30 beta bungarotoxin B2 (DUFT85) 

31 Bovine spleen TI II (FIOR85) 

32 Tachypleus tridentatus (Horseshoe crab) hemocyte 
inhibitor (NAKA87) 

33 Bombyx mori (silkworm) SCI-III (SASA84) 

34 Bos taurus (inactive) BI-14 

35 Bos taurus (active) BI-8 
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Table 13, continued 

R # 36 37 38 39 40 

-5 ----- 

- 4 _ - - - _ 

-3 ----- 

-2 ----- 

-1 - Z - - - 

1 R R R R R 

2 P P P P P 

3 D D D D D 

4 F F F F F 

5 C C C C C 

6 L L L L L 

7 E E E E E 

8 P P P P P 

9 P P P P P 

10 Y Y Y Y Y 

11 T T T T T 

12 G G G G G 

13 P P P P P 

14 C C C C C 

15 R K K K K 

16 A A A A A 

17 R R R R K 

18 I M I M M 

19 I I I I I 

20 R R R R R 

21 Y Y Y Y Y 

22 F F F F F 

23 Y Y Y Y Y 

24 N N N N N 

25 A A A A A 

26 K K K K K - -- - 

27 A A A A A 

28 G G G G G 

29 L L L L F 

30 C C C C C 

31 Q Q Q Q E 

32 T P P P T 

33 F F F F F 

34 V V V V V 

35 Y Y Y Y Y 

36 G G G G G 

37 G G G G G 

38 C C C C C 

39 R R R R K 

40 A A A A A 

41 K K K K K 

42 R S R R S 

43 N N N N N 



R # 36 37 38 39 40 

44 N N N N N 

45 F F F F F 

46 K K K K R 

47 S S S S S 

48 A A S A A 

49 £ £ E £ E 

50 D D D D D 

51 C C C C C 

52 E M M M M 

53 R R R R R 

54 T T T T T 

55 C C C C C 

56 G G G G G 

57 G G G G G 

58 A A A A A 

59 - - - - - 

60 ----- 

61 ----- 
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Table 13, continued 



36: Engineered BPTI (KR15, ME52) : Auerswald '88, Biol Chem 

Hoppe-Seyler, 369 Supplement, pp27-35* 
37: Isoaprotinin G-l: Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 369:157-163. 
38: Isoaprotinin 2: Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 369: 157-163* 
39: Isoaprotinin G-2: Siekmann, Wenzel, Schroder, and 

Tschesche . 1 88, Biol Chem Hoppe-Seyler, 369 :157-163. 
40: Isoaprotinin 1: Siekmann, Wenzel, Schroder, and 

Tschesche f 88, Biol Chem Hoppe-Seyler, 369 : 157-163 , 
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Table 13, continued 

Notes : 

a) both beta bungarotoxins have residue 15 deleted • 

b) B. mori has an extra residue between C5 and C14; we 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5, 14, 30, 38, 50, & 55. 

d) all homologues have F33 and G37. 

e) extra C ! s in bungarotoxins form interchain cystine 
bridges 
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Identification codes for Tables 14 and 15 

1 BPTI 

2 synthetic BPTI, Tan & Kaiser, . bic ahem. 16(8)1531-41 

3 Semisynthetic BPTI, TSCH87 

4 Semisynthetic BPTI, TSCH87 

5 Semisynthetic BPTI, TSCH87 

6 Semisynthetic BPTI, TSCH87 

7 Semisynthetic BPTI, TSCH87 

8 Engineered BPTI, AUER87 

9 BPTI Auerswald &al GB 2 208 511A 

10 BPTI Auerswald &al GB 2 208 511A 

11 Engineered BPTI From MARK87 

12 Engineered BPTI From MARK87 

13 BPTI (KR15 ,ME52) : Auerswald r 88, Biol Chem Hoppe-Seyler, 
369 Suppl, pp27-35. - * - 

14 BPTI CA30/CA51 Eigenbrot &al, Protein Engineering 
3(7)591-598 ('90) 

15 Isoaprotinin 2 Siekmann et al f 88, Biol Chem 
Hoppe-Seyler, 369 : 157-163 . 

16 Isoaprotinin G-2 : Siekmann et al ! 88, Biol Chem 
Hoppe-Seyler, 369 : 157-163 • 

17 BPTI Engineered, Auerswald &al GB 2 208 511A 
'18 BPTI Engineered, Auerswald &al GB 2 208 511A 

19 BPTI Engineered, Auerswald Sal GB 2 208 511A 

20 Isoaprotinin G-l Siekmann Sal '88, Biol Chem 
Hoppe-Seyler, 369 : 157-163 . 

21 BPTI Engineered, Auerswald &al GB 2 208 511A 

22 BPTI Engineered, Auerswald &al GB 2 208 511A 

23 Bovine Serum (in Dufton f 85) 

24 Bovine spleen TI II (FI0R85) 

25 Snail mucus (Helix pomatia) (WAGN78) 

26 Hemachatus hemachates (Ringhals Cobra) HHV II (in Dufton 

■85) " ----- - 

27 Red sea turtle egg white (in Dufton '85) 

28 Bovine Colostrum (in Dufton '85) 

29 Naja nivea (Cape cobra) NNV II (in Dufton 1 85) 

30 Bungarus fasciatus VIII B toxin (in Dufton 1 85) 

31 Vipera ammodytes TI toxin (in Dufton T 85) 

32 Porcine ITI domain 1, (in CREI87) 

33 Human Alzheimer's beta APP protease inhibitor, (SHIN90) 
3 4 Equine ITI domain 1, in Creighton & Charles 

35 Bos taurus (inactive) BI-8e (ITI domain 1) 

3 6 Anemonia sulcata (sea anemone) 5 II (in Dufton ' 85) 

37 Dendroaspis polylepis polylepes (Black Mamba) E toxin (in 
Dufton '85) 

38 Vipera russelli (Russel's viper) RW II (TAKA74) 

3 9 Tachypleus tridentatus (Horseshoe crab) hemocyte 

inhibitor (NAKA87) 

4 0 LACI 2 (Factor Xa) (WUNT8 8) 

41 Vipera ammodytes CTI toxin (in Dufton J 85) 
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Identification codes for Tables 14 and 15 

42 Dendroaspis polylepis polylepis (Black Mamba) venom K (in 
Dufton '85) 

43 Homo sapiens HI-8e "inactive" domain (in Dufton f 85) 

44 Green Mamba toxin K, (in CREI87) 

45 Dendroaspis angusticeps (Eastern green mamba) C13 SI C3 
toxin (in Dufton f 85) 

46 LACI 3 

47 Equine ITI domain 2, (CREI87) 

48 LACI 1 (Vila) 

49 Dendroaspis polylepis polylepes (Black mamba) B toxin (in 
Dufton »85) 

50 Porcine ITI domain 2, Creighton and Charles 

51 Homo sapiens HI-8t "active" domain (in Dufton '85) 

52 Bos taurus (active) BI-8t 

53 Trypstatin Kito &al ('88) J Biol Chem 263(34)18104-07. . 

54 Dendroaspis angusticeps (Eastern Green Mamba) C13 S2 C3 
toxin (in Dufton f 85) 

55 Green Mamba I venom Creighton & Charles '87 CSHSQB 
52:511-519. 

56 beta bungarotoxin B2 (in Dufton *85) 

57 Dendroaspis polylepis polylepis (Black mamba) venom I (in 
Dufton '85) 

58 beta bungarotoxin Bl (in Dufton ! 85) 

59 Bombyx mori (silkworm) SCI-III (SASA84) 
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Table 14: Tally of Ionizable groups 



Tdentif ier* 


D 


E 


K 


R 


Y 


■H 


NH 


C02 


+ 


ions 


1 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


2 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


3 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 


4 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 


5 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 


6 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 


7 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 


8 


2 


3 


4 


6 


4 


0 


1 


1 


5 


17 


9 


2 


2 


3 


5 


4 


0 


1 


1 


4 


14 


10 


2 


3 


3 


6 


4 


0 


1 


1 


4 


16 


11 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


12 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


13 


2 


3 


3 


7 


4 


0 


1 


1 


5 


17 


14 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


15 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


X V? 


2 


2 


4 


6 


4 


o 


1 


1 


6 


16 


17 


2 


2 


3 


5 


4 


0 


1 


1 


4 


14 


18 


2 


3 


3 


5 


4 


0 


1 


1 


3 


15 




2 


3 


3 


5 


4 


0 


1 


1 


3 


15 


20 


2 


2 


4 


5 


4 


0 


1 


1 


5 


15 


21 


2 


3 


3 


4 


4 


0 


1 


1 


2 


14 


22 


2 


4 


3 


4 


4 


0 


1 


1 


1 


15 


23 


2 


4 


4 


4 


4 


0 


1 


1 


2 


16 


24 


2 


3 


5 


4 


4 


0 


1 


1 


4 


16 


25 


1 


1 


2 


4 


4 


0 


1 


1 


4 


10 


26 


2 


3 


2 


5 


3 


1 


1 


1 


2 


14 


27 


2 


4 


6 


8 


3 


0 


1 


1 


8 


22 


28 


2 


4 


2 


3 


3 


0 


1 


1 


-1 


13 


29 


1 


4 


2 


7 


2 


2 


1 


1 


4 


16 


30 


1 


2 


5 


3 


4 


2 


1 


1 


5 


13 


31 


4 


1 


5 


3 


4 


2 


1 


1 


3 


15 


32 


1 


4 


3 


2 


4 


1 


1 


1 


0 


12 


33 


2 


6 


1 


5 


3 


0 


1 


1 


-2 


16 


34 


2 


4 


2 


2 


3 


1 


1 


1 


-2 


12 


35 


2 


2 


3 


2 


4 


0 


1 


1 


1 


11 


36 


1 


5 


4 


5 


4 


1 


1 


1 


3 


17 


37 


0 


2 


6 


3 


3 


3 


1 


1 


7 


13 


38 


2 


5 


3 


7 


3 


2 


1 


1 


3 


19 


39 


3 


3 


5 


5 


4 


0 


1 


1 


4 


18 


40 


3 


7 


4 


3 


4 


0 


1 


1 


-3 


19 


41 


3 


2 


4 


6 


5 


1 


1 


1 


5 


17 


42 


1 


2 


8 


5 


4 


0 


1 


1 


10 


18 


43 


1 


4 


2 


2 


4 


0 


1 


1 


-1 


11 


44 


1 


2 


9 


4 


5 


0 


1 


1 


10 


18 


45 


0 


2 


8 


4 


5 


0 


1 


1 


10 


16 


46 


1 


3 


5 


5 


3 


0 


1 


1 


6 


16 


47 


3 


4 


4 


3 


3 


0 


1 


1 


0 


16 


48 


3 


6 


5 


4 


1 


1 


1 


1 


0 


20 


49 


0 


3 


3 


5 


5 


0 


1 


1 


5 


13 


50 


2 


6 


4 


2 


3 


0 


1 


1 


-2 


16 
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Table 14: Tally of Ionizable groups 



Identifier 


D 


E 


K 


R 


Y 


*H 


NH 


C02 


+ 


ions 


51 


2 


4 


4 


3 


3 


0 


1 


1 


1 


15 


52 


1 


4 


6 


2 


3 


0 


1 


1 


3 


15 


53 


2 


2 


5 


1 


4 


0 


1 


1 


2 


12 


54 


2 


3 


6 


8 


3 


1 


1 


1 


9 


21 


55 


1 


3 


6 


7 


3 


1 


1 


1 


9 


19 


56 


6 


2 


6 


7 


4 


3 


1 


1 


5 


23 


57 


0 


3 


7 


7 


3 


1 


1 


1 


11 


19 


58 


6 


2 


5 


7 


4 


2 


1 


1 


4 


22 


59 


4 


7 


3 


1 


4 


0 


1 


1 


-7 


17 
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Table 15: Frequency of Amino Acids at Each Position 

in BPTI and 58 Hoxnologues 

Res. Different 

Id. AAs Contents First 

-5 2 -58 D 

-4 2 -58 E 

-3 5 »55 P T Z F 

-2 10 -43 R3 Z3 Q3 T2 E G H K L 

-1 11 -41 D4 P3 R2 T2 Q2 G K N Z E 

1 13 R35 K6 T4 A3 H2 G2 L M N P I D - R 

2 10 P35 R6 A4 V4 H3 E3 N F I L P 

3 11 D32 K8 S4 A3 T3 R2 E2 P2 G L Y D 

4 9 F34 A6 D4 L4 S4 Y3 12 W V F 

5 1 C59 C 

6 13 L25 N7 E6 K4 Q4 13 D2 S2 Y2 R F T A L 

7 7 L28 E25 K2 F Q S T E 

8 10 P46 H3 D2 G2 E I K L A Q P 

9 12 P30 A9 14 V4 R3 Y3 L F Q H E K P 
9a 2 -58 G 

10 9 Y24 E8 D8 V6 R3 S3 A3 N3 I Y 

11 11 T31 Q8 P7 R3 A3 Y2 K S D V I T 

12 2 G58 K G 

13 5 P45 R7 L4 12 N ^ P 

14 3 C57 AT C 

15 12 K22 R12 L7 V6 Y3 M2 -2 N I A F G K 

16 7 A41 G9 F2 D2 K2 Q2 R A 

17 14 R19 L8 K7 F5 M4 Y4 H2 A2 S2 G2 I N T P R 

18 8 141 M7 F4 L2 V2 ETA I 

19 10 124 P12 R8 K5 S4 Q2 L N E T I 

20 5 R39 A8 L6 S5 Q R 

21 5 Y35 F17 W5 I L Y 

22 6 F32 Y18 A5 H2 S N F 

23 2 Y52 F7 Y 

24 4 N47 D8 K3 S N 

25 13 A29 S6 Q4 G4 W4 P3 T2 L2 R N K V I A 

26 11 K31 A9 T5 S3 V3 R2 E2 G H F Q K 

27 8 A32 Sll K5 T4 Q3 L2 I E A 

28 7 G32 K13 N5 M4 Q2 R2 H G 

29 10 L22 K13 Qll A5 F2 R2 N G M T L 

30 2 C58 A C 

31 10 Q25 E17 L5 V5 K2 N A R I Y Q 

32 11 T25 Pll K4 Q4 L4 R3 E3 G2 S A V T 

33 1 F59 F 

34 13 V24 110 T5 N3 Q3 D3 K3 F2 H2 R S P L V 

35 2 Y56 m Y 

36 3 G50 S8 R G 

37 1 G59 G 

38 3 C57 AT C 
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Table 15: Frequency of Amino Acids at Each Position 
in BPTI and 58 Homologues (continued) 

Res, Different 



lu • 


AAs 


Pnntpnts 


First 




Q 




m OA M3 L2 D2 P 


R 


AH 
ft u 








A 


A T 


3 




1\Z -4: UZ» 


K 


A 1 




XV ^ Z 


aio rift O? H2 M D E "K T, 


R 


ft J 


2 


N57 




N 


44 


3 


N40 


R14 K5 


N 


45 


2 


F58 


Y 


F 


46 


11 


K39 


Y5 E4 S2 V2 D2 R H T A L 


K 


47 


2 


S36 


T23 


S 


48 


11 


A23 


111 E6 Q6 L4 K2 T2 W2 S D R 


A 


49 


8 


E37 


K8 D6 Q3 A2 P H T 


E 


50 


7 


E27 


D25 K2 L2 M Q Y 


D 


51 


2 


C58 


A 


C 


52 


9 


M17 


R15 E8 L7 K6 Q2 T2 H V 


M 


53 


11 


R37 


E6 Q5 K2 C2 H2 A N G D W 


R 


54 


8 


T41 


Y5 A4 V3 12 E2 M K 


T 


55 


1 


C59 




C 


56 


10 


G33 


V9 R5 14 E3 L A S T K 


G 


57 


12 


G34 


V6 -5 A3 R2 12 P2 D K S L N 


G 


58 


10 


A25 


«15 P7 K3 S2 Y2 G2 F D R 


A 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13 -HAY- 
C0MPND BOVINE PANCREATIC TRYPSIN INHIBITOR 

COMPND 2 (/BPTI$, CRYSTAL FORM /III$) 
AUTHOR A . WLODAWER 

Solvent radius = 1.40 
Atomic radii given in Table 7 

Areas in A 2 . 



Not Not 
Total Covered covered 
Residue area by M/C fraction at all fraction 



ARG 


1 


342. 


45 


205. 


09 


0. 


5989 


152.49 


0. 


4453 


PRO 


2 


239. 


12 


92. 


65 


0. 


3875 


47 


.56 


0. 


1989 


ASP 


3 


272. 


39 


158. 


77 


0. 


5829 


143 


.23 


0. 


5258 


PHE 


4 


311. 


33 


137. 


82 


0. 


4427 


43 


.21 


0. 


1388 


CYS 


5 


241. 


06 


48. 


36 


0. 


2006 


0 


.23 


0. 


0010 


LEU 


6 


280. 


98 


151. 


45 


0. 


5390 


115 


.87 


0. 


4124 


GLU 


7 


291. 


39 


128. 


91 


0. 


4424 


90 


.39 


0. 


3102 


PRO 


8 


236. 


12 


128. 


71 


0. 


5451 


99 


.98 


0. 


4234 


PRO 


9 


236. 


09 


109. 


82 


0. 


4652 


45 


.80 


0. 


1940 


TYR 


10 


330. 


97 


153. 


63 


0. 


4642 


79 


.49 


0. 


2402 


THR 


11 


249. 


20 


80. 


10 


0. 


3214 


64 


.99 


0. 


2608 


GLY 


12 


184. 


21 


56. 


75 


0. 


3081 


23 


.05 


0. 


1252 


PRO 


13 


240.07 


130. 


25 


0. 


5426 


75 


.27 


0. 


3136 


CYS 


14 


237. 


10 


75. 


55 


0. 


3186 


53 


.52 


0. 


2257 


LYS 


15 


310. 


77 


200. 


25 


0. 


6444 


192 


.00 


0. 


6178 


ALA 


16 


209. 


41 


66. 


63 


0. 


3182 


45 


.59 


0. 


2177 


ARG 


17 


351. 


09 


243. 


67 


0. 


6940 


201 


.48 


0. 


5739 


ILE 


18 


277. 


10 


100. 


51 


0. 


3627 


58 


.95 


0. 


2127 


ILE 


19 


278. 


03 


146. 


06 


0. 


5254 


96 


.05 


0. 


3455 


ARG 


20 


339. 


11 


144. 


65 


0. 


4266 


43 


.81 


0. 


1292 


TYR 


21 


333. 


60 


102. 


24 


0. 


3065 


69 


.67 


0. 


2089 


PHE 


22 


306. 


08 


70. 


64 


0. 


2308 


23 


.01 


0. 


0752 


TYR 


23 


338. 


66 


77. 


05 


0. 


2275 


17 


.34 


0. 


0512 


ASN 


24 


264. 


88 


99. 


03 


0. 


3739 


38 


.69 


0. 


1461 


ALA 


25 


211. 


15 


85. 


13 


0. 


4032 


48 


.20 


0. 


2283 


LYS 


26 


313. 


29 


216. 


14 


0. 


6899 


202 


.84 


0. 


6474 


ALA 


27 


210. 


66 


96. 


05 


0. 


4560 


54 


.78 


0. 


2601 


GLY 


28 


186. 


83 


71. 


52 


0. 


3828 


32 


.09 


0. 


1718 


LEU 


29 


280. 


70 


132. 


42 


0. 


4718 


93 


.61 


0. 


3335 


CYS 


30 


238. 


15 


57. 


27 


0. 


2405 


19 


.33 


0. 


0812 


GLN 


31 


301. 


15 


141. 


80 


0. 


4709 


82 


.64 


0. 


2744 


THR 


32 


251. 


26 


138. 


17 


0. 


5499 


76 


.47 


0. 


3043 
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Table 16, continued. 



Jbrxlr* 


7 7 


o ha on 


59 


.79 


0 . 


1965 


18.91 


0. 


0622 


VAJU 


7 A 
j4 




109.78 


0* 


4364 


42.36 


0.1684 


mVD 


7 5^ 


719 fiA 


q n 


KO 
• DZ 


0 . 


2421 


15.05 


0 . 


0452 




O O 


T Q7 n^ 


XX 


q n 
• y u 


0 . 


0636 


1.97 


0 . 


0105 


GliX 


7*7 


1 Q C Oft 

lOJ . Z o 


Q A 

o4 


• ZD 


n 


4548 

** *J *x o 


39.17 


o 


2114 






Z O 4 * ju 




• t>4 


0 . 


3139 


26.40 


n 




AKLr 




4 X / • IJ 


*3 Pi A 
J U4 


AO 

* oz 


o 

\J » 


7303 


250. 73 


o . 


6011 


AT A 


A f\ 


•5 no t\7 


Q A 

y 4 


m 

. Ul 


0 . 


4487 


52 . 95 


o . 


2527 


T VC 


4 X 


n a fin 

J X*i • O U 


loo 


O 7 


0 . 


5284 


108.77 


0 . 


3457 




4Z 


7 aq n fi 


Z «3Z 


C7 


0 . 


6670 


179 . 59 


0 . 


5145 




A *3 


Z OO . 4 / 


38 


.53 




1446 


5.32 


0. 


0200 


ASN 


44 


z oy • oo 


91 


.08 


0 • 


3378 


23 . 39 


0. 


0867 


PHE 


45 


313.22 


69 


.73 


0. 


2226 


14.79 


0. 


0472 


LYS 


46 


309.83 


217 


.18 


0. 


7010 


155.73 


0. 


5026 


SER 


47 


224.78 


69 


.11 


0. 


3075 


24.80 


0. 


1103 


ALA 


48 


211.01 


82 


.06 


0. 


3889 


31.07 


0. 


1473 


GLU 


49 


286.62 


161 


.00 


0. 


5617 


100.01 


0. 


3489 


ASP 


50 


299.53 


156 


.42 


0. 


5222 


95.96 


0. 


3204 


CYS 


51 


238.68 


24 


.51 


0. 


1027 


0.00 


0. 


0000 


MET 


52 


293.05 


89 


.48 


0. 


3054 


66.70 


0. 


2276 


ARG 


53 


356.20 


224 


.61 


0. 


6306 


189.75 


0. 


5327 


THR 


54 


251.53 


116 


.43 


0. 


4629 


51.64 


0. 


2053 


CYS 


55 


240.40 


69 


.95 


0. 


2910 


0.00 


0. 


0000 


GLY 


56 


184.66 


60 


.79 


0. 


3292 


32.78 


0. 


1775 


GLY 


57 


106.58 


49 


.71 


0. 


4664 


38.28 


0. 


3592 


ALA 


58 


no position given 


in Protein Data 


Bank 



"Total area" 



"Not covered 
by M/C" 



is the area measured by a rolling sphere of 
radius 1.4 A, where only the atoms within the 
residue are considered. This takes account of 
conformation. 

is the area measured by a rolling sphere 
of radius 1.4 A where all main-chain atoms are 
considered, fraction is the exposed area 
divided by the total area. Surface buried by 
main-chain atoms is more definitely covered 
than is surface covered by side group atoms. 



"Not covered 
at all" 



is the area measured by a rolling sphere 
of radius 1.4 A where all atoms of the 
protein are considered. 
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Table 17: Plasmids used in Detailed Example I 
Phage Contents 

LG1 M13iapl8 with Ava II/Aat II/Acc I/ Rsr II/Sau I 

adaptor 

pLG2 LG1 with amp R and ColEl of pBR322 cloned into 

Aat II/ Acc I sites 
pLG3 pLG2 with Acc I site removed 

pLG4 pLG3 with first part of osp-pbd gene cloned 

into Rsr II/ Sau I sites, Avr II/ Asu II sites 

created 

pLG5 pLG4 with second part of osp-p bd gene cloned 

into Avr II/ Asu II sites, BssH I site created 
pLG6 pLG5 with third part of osp-pbd gene cloned 

into Asu H/ BssH I sites, Bbe I site created 
pLG7 pLG6 with last part of osp-pbd gene cloned 

into Bbe I/ Asu II sites 
pLG8 pLG7 with disabled osp-pbd gene, same length 

DNA. 

pLG9 pLG7 mutated to display BPTI (V15 BPTI ) 

pLGlO pLG8 + tet R gene - amp R gene 

pLGll pLG9 + tet R gene - amp R gene 
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Table 18: Enzyme sites eliminated when 
M13mpl8 is cut by Ava il 
and Bsu36I 



10 



15 



Ahall 

Fsp l 

EcoRI 

Smal 

Hindlll 

Hindll 



Narl 

Bal l 

Sad 

BamH I 

AccI 



Gdill 

HaiE II 

Kpn l 

Xbal 

PstI 



Pvu l 
Bsu36I 
Xma l 
Sail - 



20 



Table 19: Enzymes not cutting 
Ml3mpl8 



25 



30 



35 



40 



Aat ll 

BbvII 

BstBI 

EC057I 

Esp l 

Nhe l 

PflMI 

Rsr I 

Spel 

Xcal 



Afl l 

Bel l 

BstE II 

ECON I 

Heal 

NotI 

PmaCI 

SacI 

StuI 

Xhol 



Apa l 

BstdM I 

BstXI 

ECO0109I 

Mlu l 

Nru l 

Ppa l 

Sea l 

Styl 



Avr ll 

BssHI 

Eaa l 

EcoRV 

Ncol 

Nsil 

PpuM I 

Sfi l 

Tthllll 
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Table 20: Enzymes cutting 
Amp R gene and ori 



Aatll 


BbvII 


Eco57I 


Ppal 


Seal 


Tthllll 


Ahall 


Gdill 


Pvul 


Fspl 


Ball 


HgiEII 


Hindu 


PstI 


Xbal 


AflHI 


Ndel 
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Table 21: Enzymes tested on Ambig DNA 







Enzvme 


Recocrnition 


Symm cuts 




SutyDlv 




5 


%AccI 


GTMKAC 


p 


p 




4 


<B,M f I,N,P,T 






Aflll 


CTTAAG 


p 

XT 


1 
_L 




5 


<N 






Apal 


GGGCCC 


P 

X 


K 




1 


<M , I , N r P , T 






AsuII 


TTCGAA 


P 

XT 


£> 


oc 


4 


<P,N(BstBI) 






Avalll 


ATGCAT 


P 

XT 


R 


oc 


1 


<T; NsiI:M N P Tr 




10 














EcoT22X : T 

-I-J -L» A« -u ^JL. » JL 






Avrll 


CCTAGG 

**** JW AW \ J 


P 

XT 


*i 
X 


oc 


5 


<N 






BamHT 

l/U ALU A >X 


GGATCC 


T) 

XT 


n 
X 


oc 


5 


<S.B.M.I.N.P.T 






Bell 


TGATCA 


T> 
X 


X 


oc 


5 


<S B M T N T 








TCCGGA 


"D 


X 


oc 


5 


<N 




15 


BssHTT 




±r 


X 


OC 


5 


<N,T 






+BstEII 

* JJw A-l JL J. 


GGTNACC 


P 

X 


X 


oc 




<S B M N T 

^ *— ' f U f XX f A* ^ X 






- %BstXT 

O XJ w l#^LX, 


CCANNNNN 


P 
X 


p 
o 


oc 


4 


<N P T 

^vXl , x- f X 






+DraII 


RGGNCCY 


Jr 




r 

oc 


5 


<M T ; EcoO109I«N 


o 




+EcoNT 


CCTNNNNN 

v>V111 Vt Xi Xi 


ir 


O 


r 

oc 




^»Xl ^ DUUil y 


Ml 




RrnRT 

J_i <w UI\X 




T) 

Ir 


X 


6c 


IS 


•^c; R M T N P T 


m 




EcoRV 


GATATC 


Jr 




OC 


3 


<S R M T N P T 

>U ^ U;il^ X j XI f -XT f X 






+EsgI 


GCTNAGC 


P 


2 


& 


5 


<T 






Hindlll 


AAGCTT 


P 


1 


& 


5 


<S / B,M / I,N / P,T 






Hpal 


GTTAAC 


r* 




r 
OC 


3 


<S / B / M / I,N / P f T 




25 


KnnI 


GGTACC 


■n 

XT 


D 


OC 


1 


<S B M T N P T r 


SI 
















Asr>718 *M 






Mlul 


ACGCGT 


P 


X 


£ 
OC 


5 


<M,N. P,T 






Narl 


GGCGCC 


p 




£ 

OC 


4 


<B,N / T 






Ncol 


CCATGG 


P 

XT 


X 


£ 
oc 


5 


<B -M.N , P . T 




30 


Nhel 


GCTAGC 


T5 

XT 


X 


r, 
oc 


5 


<M , N . P . T 


n 1 




NotI 


GCGGCCGC 


P 

X 




£ 
oc 


a 


W P T 

^X"! f XX f XT f X 






Nrul 


TCGCGA 


P 

Jr 




£ 
oc 


3 


<B M N T 

^» XJ j 11 j XI ^ X 




■ 


+PflMI 


CCANNNNN 


P 


7 




4 


<N 


f t - 




PmaCI 


CACGTG 


P 


3 




3 


<none 




35 


+PpuMI 


RGGWCCY 


P 


2 


& 


5 


<N 






•fRsrll 


CGGWCCG 


P 


2 


& 


5 


<N,T 






SacI 


GAGCTC 


P 


5 


& 


1 


<B(SstI) ,M,I,N,P, 






Sail 


GTCGAC 


P 


1 




5 


T 

<B / M / I / N / P / T 




40 


+Saul 


CCTNAGG 


P 


2 




5 


<M; CvnI:B; Mstll 


















;T; Bsu3 6l:N; AocI 






+Sfil GGCCNNNNNGGCC 


P 


8 


& 


5 


<N,P,T 






Smal 


CCCGGG 


P 


3 


& 


3 


<B / M,I / N,P / T 






Spel 


ACTAGT 


P 


1 


& 


5 


<M,N,T 




45 


Sphl 


GCATGC 


P 


5 


& 


1 


<B,M,I,N,P / T 






StuI 


AGGCCT 


P 


3 


& 


3 


<M / N / I(AatI) , P f T 






%StyI 


CCWWGG 


P 


1 


& 


5 


<N,P,T 






Xcal 


GTATAC 


P 


3 


& 


3 


<N(soon) 
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Table 21, continued. 



5 2&OI CTCGAG P 

Xma l CCCGGG P 

Xmalll CGGCCG P 

10 

N restrct = 43 



1 & 5 <B,M,I,P,T; Ccrl: 

T ; PaeR7I:N 
1 & 5 <I,N,P,T 
1 & 5 <B; Eag;I:N; 

Eco52 I;T 



15 



o 

IB 
03 

m 
m 
M 

m 

o 

01 

ry 
o 
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Table 22: ipbd gene 



pbd modlO 29III88 : 

lacUVS RsrII/Avrll/gene/TrBA attenuator/Mstll; ! 

5 1 - CGGaCCG TaT ! RsrH site 

CCAGGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! lacUVS 



TGG aATTGTGAGCGGATAACAATT 
CCT AGGAgg CtcaCT 

atg aag aaa tct ctg gtt ctt aag get age 
gtt get gtc gcg acc ctg gta ccg atg ctg 
tct ttt get cgt ccg gat ttc tgt etc gag 
ccg cca tat act ggg ccc tgc aaa gcg cgc 
ate ate cgt tat ttc tac aac get aaa gca 
ggc ctg tgc cag acc ttt gta tac ggt ggt 
tgc cgt get aag cgt aac aac ttt aaa teg 
gee gaa gat tgc atg cgt acc tgc ggt ggc 
gec get gaa ggt gat gat ccg gec aaa gcg 
gee ttt aac tct ctg caa get tct get acc 
gaa tat ate ggt tac gcg tgg gee atg gtg 
gtg gtt ate gtt ggt get acc ate ggt ate 
aaa ctg ttt aag aaa ttt act teg aaa gcg ! 
tct taa tag tga qqttacc ! BstE II 
agtcta "agcccgc ctaatga geggget tttttttt ! 
CCTgAGG -3 1 i Mstll 



lacO operator 

Shine-Dalgarno seq. 

10, Ml 3 leader 

20 

30 

40 

50 

60 

70 

80 

90 

100 

110 

120 

130 

terminator 
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Table 23: ipbd DNA sequence 

DNA Sequence file = UV5_M13PTIM13 .DNA; 17 
DNA Sequence title = 
pbd modlO 29III88 : lac-UV5 RsrII/Avrll/gene/TrpA 

attenuator/Mstll ; ! 



1 


C 


GGA 


CCG 


TAT 


CCA| 


GGC 


TTT 


ACA 


CTT 


TAT 


GCT 


TCC 


GGC 


TCG 


41 


TAT 


AAT 


GTG 


TGG 


AAT 


TGT 


GAG 


CGG 


ATA 


ACA I 


ATT 


CCT 


AGG 


AGG 


83 


CTC 


ACT 


ATG 


AAG 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGC 


GTT 


GCT 


125 


GTC 


GCG 


ACC 


CTG 


GTA 


CCG 


ATG 


CTG 


TCT 


TTT 


GCT 


CGT 


CCG 


GAT 


167 


TTC 


TGT 


CTC 


GAG 


CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


AAA 


GCG 


CGC 


209 


ATC 


ATC 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 


GCA 


GGC 


CTG 


TGC 


CAG 


251 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


293 


AAA 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 


GGC 


GCC 


GCT 


335 


GAA 


GGT 


GAT 


GAT 


CCG 


GCC 


AAA 


GCG 


GCC 


TTT 


AAC 


TCT 


CTG 


CAA 


377 


GCT 


TCT 


GCT 


ACC 


GAA 


TAT 


ATC 


GGT 


TAC 


GCG 


TGG 


GCC 


ATG 


GTG 


419 


GTG 


GTT 


ATC 


GTT 


GGT 


GCT 


ACC 


ATC 


GGT 


ATC 


AAA 


CTG 


TTT 


AAG 


461 


AAA 


TTT 


ACT 


TCG 


AAA 


GCG 


TCT 


TAA 


TAG 


TGA 


GGT 


TAC 


CAG 


TCT 


503 


AAG 


CCC 


GCC 


TAA 


TGA 


GCG 


GGC 


TTT 


TTT 


TTT 


CCT 


GAG 


G 





Total = 539 bases 
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Table 24: Summary of Restriction Cuts 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



% Acc I has 
Acc III has 
Acv I has 
Afl II has 
% Afl III has 
Aha III has 
Apa I has 
Asp718 has 
Asu II has 
%Ava I has 
Avr II has 
% Ban I has 
Bbe I has 
•f Bal - 1 has 
+Bin I has . 
% BspM I has 
BssH II has 
+ BstE II has 
%BstX I has 
Cfr I has 
+Dra II has 
+ Esp I has 



1 observed sites : 259 
1 observed sites : 162 
1 observed sites : 328 
1 observed sites : 109 



1 observed sites : 404 



1 
1 
1 

3 



% Fok I 

Gdi II 
Hae I 



Hae II 
+Hga I 
% HaiC I 
% HaiJ II 
Hind III 
+ Hph I has 
Kpn I has 
+Mbo II has 



has 
has 
has 
has 
has 
has 
has 
has 



1 observed sites 
1 observed sites : 
1 observed sites 
observed sites 
observed sites 
observed sites 
observed sites 

1 observed sites : 
1 observed sites : 
.1 observed sites : 

1 observed sites 
1 observed sites 

1 observed sites 
1 observed sites 

2 observed sites : 
1 observed sites 

1 observed sites 

1 observed sites 

2 observed sites 
1 observed sites : 

1 observed sites 
1 observed sites 
3 observed sites : 
1 observed sites : 
1 observed sites : 



: 292 
193 
138 
471 
175 
76 

138 328 540 
328 
352 
346 
: 319 
: 205 

493 
: 413 
299 350 
: 193 
277 
213 

299 350 
240 
328 
478 

: 138 328 540 
193 
377 




Stu I 
% Sty I 



1 observed sites : 340 
"observed' sites : 138 
2 observed sites : 93 304 
observed sites : 404 
observed sites : 328 
observed sites : 413 
observed sites : 115 
observed sites : 128 

1 observed sites : 311 
1 observed sites : 332 
1 observed sites : 184 
1 observed sites : 193 

1 observed sites : 3 
1 observed sites : 535 

2 observed sites : 144 209 

1 observed sites : 351 
1 observed sites : 311 
1 observed sites : 240 

2 observed sites : 76 413 
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Table 24, continued. 

Enz = Xca I has 1 observed sites : 259 
Enz - Xho I has 1 observed sites : 175 
Enz = Xma III has 1 observed sites : 299 

Enzymes that do not cut 



Aat II 


AlwN I 


ApaL I 


Ase I 


Ava III 


Bal I 


BamH I 


Bbv I 


Bbv II 


Bel I 


Bal II 


Bsm I 


BspH I 


Cla I 


Dra III 


Eco47 III 


EcoN I 


EcoR I 


EcoR V 


HqiA I 


Hinc II 


Hpa I 


Mst I 


Nae I 


Nde I 


Not I 


Pie I 


PmaC I 


PpuM I 


Pst I 


Pvu I 


Pvu II 


Sac I 


Sac II 


Sal I 


Sea I 


Sma I 


SnaB I . 


Spe I 


Sst) I 


Tag II 


Tthlll I 


Tthlll II 


Xho II 


Xma I 


Xmn I 
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Table 25: Annotated Sequence of ipbd gene 



5 1 - C|GGA|CCG 
| Rsr II 



TAT | CCA j GGC 



TTTjACA 
-35 



CTT [ TAT | 



28 



|gct|tcc|ggc|tcg 



TAT | AAT 
-10 



GTG TGG 



52 



AAT | TGT | GAG | CGG | ATA | ACA | ATT 
lac operator 



73 



CCT | AGG 
Avr II 



AGG CTC ACT 



88 



m 


k 


k 


s 


1 


V 


1 


k 


a 


s 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


ATG 


AAG 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGC 














Af: 


. II 


Nhe I 


V 


a 


V 


a 


t 


1 


V 


P 


in 


1 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


GTT 


GCT 


GTC 


GCG 


ACC 


CTG 


GTA 


CCG 


ATG 


CTG 






1 Nru ] 




L 


Kpn 


-XI 






s 


f 


a 


r 


P 


d 


f 


c 


1 


e 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


TCT 


TTT 


GCT 


CGT 


CCG 


GAT 


TTC 


TGT 


CTC 


GAG 








1 AccI] 


□a 






Ava I 


















Xho I 


P 


P 


y 


t 


g 


P 


c 


k 


a 


r 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


AAA 


GCG 


CGC 




Pf 1M I 


1 








BssH II 



118 



148 



178 



208 



Apa I | 
Dra II 
Pss I 



i 


i 


r 


y 


f 


y 


n 


a 


k 


41 


42 


43 


44 


45 


46 


47 


48 


49 


ATC 


ATC 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 
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Table 25, continued. 



a 
50 


g 

51 


1 

52 


c 
53 


q 

54 


t 

55 


f 
56 


V 
57 


y 

58 


g 

59 


g 

60 


GCA 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


i- 


Stu 


-li 










Acc I 




















Xca I 









c 


r 


a 


k 


r 


n 


n 


f 


k 








61 


62 


63 


64 


65 


66 


67 


68 


69 








TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 




" 295 








Estd I 


-L 
















s 


a 


e 


d 


c 


m 


r 


t 


C 


g 






70 


71 


72 


73 


74 


75 


76 


77 


78 


79 






TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 


325 




|Xi 


nalll 1 




1 SDh II 










m 


g 


a 


a 


e 


g 


d 


d 










w 


80 


81 


82 


83 


84 


85 


86 












GGC 


GCC 


GCT 


GAA 


GGT 


GAT 


GAT 








346 




Bbe I 






















Nar I 





















P 


a 


k 


a 


a 










87 


88 


89 


90 


91 










CCG 


GCC 


AAA 


GCG 


GCC 










L 


Sfi I 














f 


n 


s 


1 


q 


a 


s 


a 


t 


92 


93 


94 


95 


96 


97 


98 


99 


100 


TTT 


AAC 


TCT 


CTG 


CAA 


GCT 


TCT 


GCT 


ACC 
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388 



[Hind 



e 


y 


i 


g 


y 


a 


101 


102 


103 


104 


105 


106 


GAA 


TAT 


ATC 


GGT 


TAC 


GCG 



w 



409 



Mlu II 



a 




V 


V 


V 


108 


109 


110 


111 


112 


GCC 


ATG 


GTG 


GTG 


GTT 



424 



BstX I [ 



Nco 1 1 
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Table 25, continued. 



i 


V 


g 


a 


t 


i 


g 


i 








113 


114 


115 


116 


117 


118 


119 


120 








ATC 


GTT 


GGT 


GCT 


ACC 


ATC 


GGT 


ATC 






448 


k 


1 


f 


k 


k 


f 


t 


s 


k 


a 




121 


122 


123 


124 


125 


126 


127 


128 


129 


130 




AAA 


CTG 


TTT 


AAG 


AAA 


TTT 


ACT 


TCG 


AAA 


GCG 


478 














lAsu ] 


111 






s 






















131 


132 


133 


134 
















TCT 


TAA 


TAG 


TGA 


GGT 


TAC 


CAG 


TCT 


- 




502 










BstE II | 











| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT 
I Trp terminator 



532 



CCT | GAG | G -3 ! 
Sau I 1 



539 



Note the following enzyme equivalences; 



Xma III 
Acc III 
Dra II 
Asu II 
Sau I 



= Sag I 
= BspM II 
= ECQO109 
= BstB I 
- Bsu36 I 
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Table 26: DNA_seql 



ccg | tec | gtc | GGA | CCG 
spacer | Rsr II 



TATICCAIGGC 



TTT I ACA 
-35 



CTT TAT 



iGCTlTCClGGClTCG 



TAT I AAT 
-10 



GTG TGG 



AAT | TGT | GAG | CGG j ATA ] ACA | ATT 
lac operator 



CCT|AGG 
Avr II 



gec | get I ccT 
spacer 



s 


k 


a 


128 


129 


130 


TCG 


AAA 


GCG 


ASU : 


□LL 





s 


• 


• 




131 


132 


133 


134 


TCT 


TAA 


TAG 


TGA 



GGT | TAC j CAG [ TCT | 
BstE II 1 



| AAG | CCC | GCC [ TAA | TGA | GCG | GGC | TTT | TTT | TTT I 
1 Tro terminator _] 



CCT | GAG | Gca | ggt | gag | eg - 3' 
Sau I 1 spacer |_ 
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Table 27: DNA_synthl 
5' [ CCG | TCC | GTC | GGA | CCG | TAT | CCA | GGC | TTT | ACA j CTT | TAT [ 
| GCT | TCC | GGC | TCG [ TAT | AAT | GTG| TGG| 



| AAT | TGT [ GAG | CGG | ATA j ACA j ATT | 
olig#4 « 3 1 - gt taa 



I CCT I AGG I 



gga tec 



/ 3 8 = olig#3 
[ GCC | GCT | CCT j TCG j A AA | GCG | 
egg cga gga age ttt cgc 



| TCT | TAA| TAG | TGA [ GGT | TAC | CAG | TCT | 
aga att ate act cea atg gtc aga 



| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT j TTT | TTT | 
ttc ggg egg att act cgc ccg aaa aaa aaa 



| CCT | GAG | GCA | GGT | GAG | CG 
gga etc cgt cea etc gc - 5 1 



"Top" strand 99 

"Bottom" strand 100 

Overlap 23 (14 c/g and 9 a/t) 

Net length 158 
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Table 28: DNA_seq2 



5'- 



gca cca acg 







CCT 


AGG 


AGG 


CTC 


ACT 


1 












Avr II 




















J 




D. 


L 














m 


k 


k 


s 


i 


V 


1 


k 


a 


s 




1 


2 


3 


4 


5 


6 


7 


8 


9 


10 




ATG 


AAG 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGC 
















Af] 


L II 


Nhe I 


Q 


V 


a 


V 


a 


t 


1 


V 


P 


m 


1 




11 


12 


13 


14 


15 


16 


17 


18 


19 


20 




GTT 


GCT 


GTC 


GCG 


ACC 


CTG 


GTA 


CCG 


ATG 


CTG 








1 Nru II 


J- 


Kpn 


-il 








5 


f 


a 


r 


P 


d 


f 


C 


1 


e 




21 


22 


23 


24 


25 


26 


27 


28 


29 


30 




TCT 


TTT 


GCT 


CGT 


CCG 


GAT 


TTC 


TGT 


CTC 


GAG 










! ACCII 


til 






Ava I 




















Xho I 



P 


P 


Y 


t 


g 


P 




C 


k 


a 


r 


31 


32 


33 


34 


35 


36 




37 


38 


39 


40 


CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


AAA 


GCG 


CGC 




Pj 


eiM i 




1 










BssH II 










Ana I 


















Dra II" 












Pss I 





i 


i 


r 


41 


42 


43 


ate 


ate 


cgt 



t 

127 
ACT 



s 
128 
TCG 
Asu 



k 
129 
AAa 
III 



gcg | get | gcg 
spacer 



- 3 l 
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Table 29: DNA_synth2 



- |GCA|CCA|ACG| 



| CCT | AGG | AGG | CTC | ACT ) 
| ATG | AAG | AAA | TCT | CTG j GTT | CTT | AAG | GCT | AGC | 
| GTT | GCT | GTC | GCG | ACC [ CTG | GTA [ CCG | ATG | CTG | 



/ 3 f = olig#5 
| TCT | TTT [ GCT | CGT | CCG | GAT | TTC | TGT | CTC | GAG | 
aga aaa cga gca ggc eta aag aca gag etc 



| CCG [ CCA | TAT | ACT | GGG | CCC | TGC [ AAA | GCG | CGC | 
ggc ggt ata tga ccc ggg acg ttt cgc gcg 



|ATC|ATC|CGT| 
tag tag gca 



olig#6 = 3 1 - ggc tac gac 



| ACT | TCG | AAA] GCG | GCT | GCG | 
tga age ttt cgc cga cgc 



- 5 f 



"Top" strand 
"Bottom" strand 
Overlap 
Net length 



24 (14 c/g and 10 a/t) 
155 



99 
99 
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Table 30: DNA_seq3 



ccc | tgc j aca 
spacer 



a 

39 
GCG 
BssH 



r 

40 
CGC 
II 



i 


i 


r 


Y 


f 


y 


n 


a 


k 


41 


42 


43 


44 


45 


46 


47 


48 


49 


ATC 


ATC 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 



a 


g 


1 


C 


q 


t 


f 


V 


y 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


GCA 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


-L 


Stu 












Acc I 


















Xca I 





c 
61 
TGC 



r 
62 
CGT 



a 
63 
GCT 



k 
64 
AAG 
I 



r 


n 


n 


f 


k 


65 


66 


67 


68 


69 


CGT 


AAC 


AAC 


TTT 


AAA 



S 


a 


e 


d 


c 


in 


r 


t 


c 


g 


70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 



I Xmalll 



Sph I 



g 


a 




80 


81 




GGC 


GCC 


get | gaa 


Bbe I 


spacer 


Nar I 





[ttt 



t 

127 
acT 



s 
128 
TCG 
Asu 



k 
129 
AAa 
III 



gcg | teg | ccg | - 3 ' 
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Table 31: DNA_synth3 



5'- | CCC | TGC | ACA | GCG | CGC \ 



Iatc 


ATC 


CGT 


TAT | TTC | TAC [ AAC | GCT | AAA | 


1 GCA 1 GGC I CTG 


TGC j CAG | ACC f TTT | GTA | TAC | GGT | GGT [ 








olig#8 = 3'- g cca cca 








/ 3' = olig#7 


|TGC 


CGT 


GCT 


AAG | CGT | AAC | AAC | TTT | AAA 1 


acg 


gca 


cga 


ttc gca ttg ttg aaa ttt 


j TCG 


GCC 


GAA 


GAT | TGC I ATG | CGT | ACC | TGC | GGT | 


age 


egg 


Ctt 


eta acg tac gca tgg acg cca 


|GGC 


GCC 


GCT 


GAA| 


ccg 


egg 


cgt 


ctt 






TTT 


ACT | TCG | AAA | GCG | TCG | CCG j 



aaa tga age ttt cgc age ggc -5' 



"Top" strand 93 

"Bottom" strand 97 

Overlap 25 (15 g/c & 10 a/t) 

Net length 146 
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Table 32: DNA_seq4 





g 


a 




80 


81 


cct | cgc | cct 


GGC 


GCC 


soacer 


Bbe I 




Nar I 



a 
82 
GCT 



e 


g 


a 


d 


83 


84 


85 


86 


GAA 


GGT 


GAT 


GAT 



P 


a 


k 


a 


a 


87 


88 


89 


90 


91 


CCG 


GCC 


AAA 


GCG 


GCC 



Sfi I 



f 


n 


s 


1 


q 


a 


S 


a 


t 


92 


93 


94 


95 


96 


97 


98 


99 


100 


TTT 


AAC 


TCT 


CTG 


CAA 


GCT 


TCT 


GCT 


ACC 



Hind 3 1 



e 


y 


i 


g 


y 


a 


w 


101 


102 


103 


104 


105 


106 


107 


GAA 


TAT 


ATC 


GGT 


TAC 


GCG 


TGG 










1 Mlu 1 





a 


m 


V 


V 


V 


108 


109 


110 


111 


112 


GCC 


ATG 


GTG 


GTG 


GTT 



BstX I 



Nco 1 1 



i 


V 


g 


a 


t 


i 


g 


i 


113 


114 


115 


116 


117 


118 


119 


120 


ATC 


GTT 


GGT 


GCT 


ACC 


ATC 


GGT 


ATC 



k 


1 


f 


k 


k 


f 


t 


s 


k 




121 


122 


123 


124 


125 


126 


127 


128 


129 




AAA 


CTG 


TTT 


AAG 


AAA 


TTT 


ACT 


TCG 


AAa 


gcg|tcg|ggc 














lAsu 1 


[I sr>acer 
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Table 33: DNA_synth4 
5 1 [ GOT | CGC 1 CCT | GGC ] GCC j GCT | GAA | GGT | GAT | GAT | 
| CCG | GCC | AAA | GCG | GCC | 

| TTT | AAC | TCT | CTG | CAA | GCT | TCT | GCT | ACC | 



olig#10 = 3 ! » ata tag cca atg cgc acc 



| ATC | GTT | GGT | GCT | ACC | ATC j GGT | ATC | 
tag caa cca cga tgg tag cca tag 

|aaa|ctg|ttt|aag|aaa|ttt|act|tcg|aaa|gcg|tct|tga| 

ttt gac aaa ttc ttt aaa tga age ttt cgc aga act - 5 1 



| GAA | TAT | ATC [ GGT ] TAC | GCG \ TGG 



/ 3' = olig#9 
| GCC | ATG | G TG [ GTG | GTT | 



egg tac cac cac caa 



"Top" strand 
"Bottom" strand 
Overlap 
Net length 



100 

93 



25 (14 c/g and 11 a/t) 
149 
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Table 34: Some interaction sets in BPTI 





Number 
Diff . 

AAs Contents 


BPTI 


1 


2 


3 


4 


5 




2 


D -32 














-4 


2 


E -32 














—3 


5 


T P F Z -29 














-2 


10 


Z3 R3 Q2 T2 H G L K E -18 














—i 

JL 


10 


D4 T2 P2 Q2 E G N K R -18 


— 












1 
X 


10 


R21 A2 K2 H2 P L I T G D 


R 










5 




9 


P20 R4 A2 H2 N E V F L 


P 








s 


5 


q 


10 


D15 K6 T3 R2 P2 S Y G A L 


D 








4 


s 


*t 


7 


F19 D4 L3 Y2 12 A2 S 


F 








s 


5 


•j 


1 


C33 


c 








X 


X 


c. 

D 


10 


Lll E5 N4 K3 Q2 12 Y2 D2 T R 


L 








4 




7 


5 


L18 Ell K2 S Q 


E 






s 


4 




Q 
O 


7 


P26 H2 A2 I L G F 


P 






3 


4 




Q 


9 


P17 A6 V3 R2 Q L K Y F 


P 




s 


3 


4 




XV 


10 


Yll E7 D4 A2 N2 R2 V2 S I D 


Y 


s 




s 


4 




1 1 
11 


10 


T17 P5 A3 R2 I S Q Y V K 


T 


1 


s 


3 


4 




1Z 


2 


G32 K 


G 


x 




x 


x 




X J 


5 


P22 R6 L3 N I 


p 


1 




s 


4 


s 




3 


C31 T A 


c 


1 




s 


s 


5 


X«J 


12 


K15 R4 Y2 M2 L2 -2 V G A I N 


F K 


1 


s 


3 


4 


s 




7 


A22 G5 Q2 R K D F 


A 


1 


s 


s 


s 


5 


1 7 


12 


R12 K5 A2 Y3 H2 S2 F2 L M T G 


P R 


1 


2 


3 




s 


J. o 


6 


121 M4 F3 L2 V2 T 


I 


1 


s 


s 




5 


1 Q 
X? 


7 


111 P10 R6 S2 K2 L Q 


I 


1 


2 


3 




s 


on 


5 


R19 A7 S4 L2 Q 


R 


s 


s 


s 




5 


z X 


4 


Y18 F13 W I 


Y 




2 


s 


s 


s 


z z 


6 


F14 Y14 H2 A N S 


F 




s 


3 


4 




z J 


2 


Y32 F 


Y 






s 


s 




Z *t 


4 


N26 K3 D3 S 


N 




s 


3 






25 


10 


A12 S5 Q3 P3 W3 L2 T2 K G R 


A 






s 


s 




26 


9 


K16 A6 T2 E2 S2 R2 G H V 


K 




s 


3 


4 




27 


5 


A18 S8 K3 L2 T2 


A 




2 


3 


4 




28 


7 


G13 K10 N5 Q2 R H M 


G 




2 


s 


s 




29 


10 


L9 Q7 K7 A2 F2 R2 M G T N 


L 




2 


3 






30 


1 


C33 


C 




X 


X 


X 




31 


7 


Q12 Ell L4 K2 V2 Y N 


Q 




2 


3 


4 




32 


11 


T12 P5 K4 Q3 E2 L2 G V S R A 


T 




2 


3 


s 




33 


1 


F33 


F 


X 


X 


X 


X 




34 


11 


Vll 18 T3 D2 N2 Q2 F H P R K 


V 


1 


2 


3 


s 




35 


2 


Y31 W2 


Y 


s 


s 


s 




5 


36 


3 


G27 S5 R 


G 


1 










37 


1 


G33 


G 


X 








X 


38 


3 


C31 T A 


C 


1 






s 


5 


39 


7 


R13 G9 K4 Q3 D2 P M 


R 


1 






4 


s 
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Table 34: continued. 



Number 



Res. 

# 


Diff. 
AAs 


Contents 


BPTI 


12 3 


4 


5 


40 


2 


G22 


All 


A 


s 


S 


5 


41 


3 


N20 


Kll D2 


K 




4 


s 


42 


9 


All 


R9 S4 G3 H2 D Q K N 


R 




s 


5 


43 


2 


N31 


G2 


N 






s 


44 


3 


N21 


Rll K 


N 






s 


45 


2 


F32 


Y 


F 






s 


46 


8 


K24 


E2 S2 D H V Y R 


K 






5 


47 


2 


T19 


S14 


S 


s 




5 


48 


9 


All 


19 E4 T2 W2 L2 R K D 


A 


2 s 




s 


49 


7 


E19 


D6 A2 Q2 K2 T H 


E 


2 




s 


50 


6 


E16 


D12 L2 MQK 


D 


s 




5 


51 


1 


C33 




C 


X 




X 


52 


7 


R13 


M10 L3 E3 Q2 H V 


M 


2 




s 


53 


8 


R21 


Q3 E2 H2 C2 G K D 


R 


s 




5 


54 


7 


T23 


A3 V2 E2 I Y K 


T 






5 


55 


1 


C33 




C 






X 


56 


8 


G15 


V8 13 E2 R2 A L S 


G 








57 


8 


G19 


V4 A3 P2 -2 R L N 


G 








58 


8 


All 


-10 P3 K3 S2 Y2 R F 


A 








59 


9 


-24 


G2QEAYSPR 










60 


6 


-28 


Q R I G D 










61 


3 


-31 


T P 










62 


2 


-32 


D 










63 


2 


-32 


K 










64 


2 


-32 


S 











s indicates secondary set 

x indicates in or close to surface but buried and/or 
highly conserved. 
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Table 35: 
Distances from Cp to 
Tip of Side Group 
in A 

Amino Acid type Distance 



A 


0.0 


C (reduced) 


1.8 


D 


2.4 


E 


3.5 


F 


4.3 


G 




H 


4.0 


I 


2.5 


K 


5.1 


L 


2.6 


M 


3.8 


N 


2.4 


P 


2.4 


Q 


3.5 


R 


6.0 


S 


1.5 


T 


1.5 


V 


1.5 


W 


5.3 


Y 


5.7 



Notes: These distances were calculated for standard model 
parts with all side groups fully extended. 
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Table 36: Distances, BPTI residue set #2 
Distances in A between 

Hypothetical was added to each Glycine. 





R17 


119 


Y21 


A27 


G28 


L29 


Q31 


T32 




V34 




A48 


T1 Q 


7 7 




























VO 1 

X 4b X 


X * X 


8.4 


























A97 


00 fi 


17 1 

X / • X 


12.2 
























CZOR 

O 


06 fi 


00 4 


13 • 8 


5.3 






















T 0 Q 




IE Q 
XO . O 


Q fi 


5 1 

■J . X 


5 ♦ 2 






















1 fi 1 
XO * X 


in a 


fi ft 


6.8 


10 . 6 


6 


.8 


















11 7 
JLx • / 


K 0 


fi 1 

0 • X 


19 0 

X<£ ♦ v 


15 - 5 


10 


.9 


5.4 














V J *± 


O . O 




11 fi 

X X • v> 


17.6 


21.7 


18 


.0 


11.4 


8. 


2 










a a c 

0 


1ft R 
XO • o 


ii n 

XX . \J 


R A 


12 6 

X^> . 


13 . 3 


8 


.4 


8.8 


8. 


3 


15. 


7 






FAQ 


00 O 


14 7 

X*T . / 


8 • 9 


16 . 9 


16 . 1 


12 


.2 


13 .9 


13. 


3 


19. 


8 


5 


.5 




01 6 


16 . 3 


8 * 6 


12 . 2 


10 . 3 


7 


.6 


11.3 


13 . 


2 


20. 


0 


6 


.2 


XT J 


1 a n 


11 1 

X X • «J 


9 . 0 


12 . 2 


15.4 


13 


.3 


7.9 


9. 


2 


8. 


7 


13 


.9 


T1 1 


^ a -J 


11 2 

X X • *b 


13 . 5 


18 . 8 


22 . 5 


19 


.8 


13 .5 


12. 


1 


5. 


7 


18 


.5 


I\X^> 


7 Q 


1 A fi 
X*i * 0 


6v 1 X 


27 . 4 


31.3 


27 


.9 


21.4 


18. 


1 


10. 


3 


24 


.6 


Z11 fi 


R 5 


in 1 

X VJ « X 


X "-J • ~? 


25.2 


'28.5 


24 


.6 


18 . 6 


14. 


5 


8. 


6 


19 


.8 


T1 0 

x xo 


fi 1 
0 * X 


fi n 


11 !> 


21.3 


24.4 


on 


* ^ 


14 .7 


10. 


4 


7. 


0 


15 


.0 


R20 


10. 6 


5.9 


5.4 


16 . 0 


XO • D 


14 


.6 


Q Q 


6. 


9 


7. 


8 


10 


.2 


F22 


"15.6 


10.9 


5.6 


10.5 


12.8 


10 


.3 


6.2 


8. 


1 


10.8 


10.3 


N24 


19.9 


14.7 


9.4 


4.1 


7.3 


6 


.1 


4.8 


10. 


0 


14. 


7 


11 


.4 


K26 


24.4 


20.1 


15.2 


5.4 


7.7 


9 


.8 


10.1 


15.3 


19. 


0 


17 


.0 


C30 


18.9 


12.1 


4.6 


8.8 


9.5 


5 


.3 


5.9 


8. 


2 


14. 


9 


4 


.9 


F33 


10.8 


7.4 


7.7 


12.6 


16.4 


13 


.0 


6.6 


5. 


6 


5. 


5 


12 


.2 


Y35 


8.4 


7.4 


9.4 


18.4 


21.4 


17 


.9 


12.2 


9. 


5 


5. 


8 


14 


.4 


S47 


17.6 


10.6 


6.6 


17.3 


17.9 


13 


.4 


12.6 


10. 


4 


15. 


9 


5 


.3 


D50 


20.0 


13.6 


7.2 


17.2 


16.8 


13 


.5 


13.5 


12. 


9 


17. 


6 


7 


.6 


C51 


18.9 


12.2 


4.0 


12.1 


12.2 


8 


.8 


8.8 


9. 


7 


15. 


3 


5 


.4 


R53 


25.4 


18.6 


11.0 


17.2 


15.0 


13 


.0 


15.7 


16. 


7 


22. 


3 


9 


.7 


R39 


15.4 


16.9 


17.1 


24.9 


27.2 


24 


.9 


20.1 


18. 


7 


13. 


8 


22 


.3 



■ J 
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Table 36, continued. 
Distances in A between C^. 



Hypothetical C^g was added to each Glycine. 







E49 


M52 


P9 


Til 


K15 


A16 


118 


R20 


F22 




N24 






M52 


6.1 


























P9 


17.7 


15.5 
























Til 


22.1 


21.5 


7.2 






















K15 


27.5 


28.7 


16.4 


9.5 




















A16 


22.2 


24.2 


14.9 


9.8 


6.2 


















118 


17.4 


19.5 


12.2 


9.5 


10.4 


4.9 
















R20 


13 .0 


13.8 


8.0 


9.4 


14.9 


10.6 


6.2 














F22 


13.8 


11.4 


4.1 


10.6 


19.1 


16.3 


12.7 


6.9 










1 


N24 


15.6 


11.2 


8.4 


15.3 


24.1 


21.9 


18.2 


12.7 


6. 


6 








K26 


20.9 


15.7 


12.1 


18.6 


27.9 


26.6 


23.3 


18.1 


11. 


6 


5. 


9 




C30 


8.7 


5.6 


10.6 


16.6 


24.1 


20.2 


15.7 


9.8 


6. 


8 


6. 


9 




F33 


16.5 


15.4 


4.2 


7.1 


15.0 


12.8 


9.6 


6.1 


5. 


6 


9. 


3 




Y35 


17.2 


17.8 


7.8 


5.8 


11.0 


7.6 


4.9 


4.3 


8. 


8 


14. 


8 




S47 


4.7 


9.1 


15.3 


18.5 


23.1 


17.6 


12.8 


9.1 


12. 


0 


15. 


3 




D50 


5.5 


7.7 


14.7 


18.6 


24.2 


19.2 


14.7 


9.9 


11. 


0 


14. 


7 


F1 


C51 


7.1 


5.4 


11.0 


16.4 


23.5 


19.2 


14.6 


8.7 


6. 


9 


9. 


6 


5 


R53 


6.3 


5.6 


17.9 


23.1 


29.6 


24.8 


20.3 


15.0 


13. 


8 


15. 


5 


fs r 






24 . 0 


13 . 0 


9,5 


12.0 


11.8 


12.5 


12.8 


14. 


7 


20. 


8 






K26 


C30 


F33 


Y35 


S47 


D50 


C51 


R53 












C30 


12.4 


























F33 


13.9 


10.1 
























Y35 


19.5 


13.5 


6.4 






















S47 


21.0 


8.8 


13.5 


13.2 




















D50 


20.1 


8.6 


14.3 


13.7 


5.0 


















C51 


15.0 


3.7 


10.9 


12.5 


6.9 


5.2 
















R53 


19.9 


9.9 


18.2 


18.8 


9.4 


5.8 


7.4 














R39 


24.3 


20.6 


14.4 


9.6 


20.4 


19.0 


18.8 


23.4 
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Table 37: vgDNA to vary BPTI set #2.1 

+ 



CAC | CCT 


g 

35 
GGG 


P 
36 
CCC 


c 
37 
TGC 


k 

38 
AAA 


a 
39 
GCG 


X 
40 
qfk 


spacer 


Ape 


l I 





i 


X 


r 


y 


f 


y 


n 


a 


k 








41 


42 


43 


44 


45 


46 


47 


48 


49 








ATC 


qfk 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 






235 














/ 3 


( = olig#27 


72 nts 


+ 


I 


+ 








1 


+ 










X 


g 


X 


C 


q 


t 


f 


X 


y 


g 


g 




50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 




afk 


GGt 


qfk 


TGC 


CAG 


ACC 


TTc 


qfk 


TAC 


GGT 


GGT 


268 


olig- 


f28= 


3 1 - 


acg gtc tgg aag 


**m atg 


cca 


cca 





78 nts 



Overlap =12 (7 CG, 5 AT) 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 


acg 


gca 


cga 


ttc 


gca 


ttg 


ttg 


aaa 


ttt 






Esid I 


-i 











s 


X 


e 


d 


c 


m 


70 


71 


72 


73 


74 


75 


TCT 


qfk 


GAG 


GAT 


TGC 


ATG 



age **m etc eta acg^ tac gca ccc acc -5 ! 

| Suh I | spacer | 

k = equal parts of T and G; m - equal parts of C and A; 
q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 
* = complement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 = 8.6 x 10' 
Abundance x 10: 

of PPBD .768 .271 .459 .671 .600 .459 

Produce = 1.77 x 10~ 8 

Parent « 1/(5.5 x 10 7 ) least favored = 1/(4.2 x 10 9 ) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.6 x 10 7 
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Table 38: Result of varying set#2 of BPTI 2.1 



1 


e 


29 


30 


CTC 


GAG 


Ava I 


Xho I 



p 

31 
CCG 



P 
32 
CCA 



y 

33 
TAT 



t 
34 
ACT 



p 


c 


k 


a 


D 


36 


37 


38 


39 


40 


CCC 


TGC 


AAA 


GCG 


GAT 



PflM 



g 

35 
GGG 



Ana I 1 
Dra II 
Pss I 



i 


Q 


r 


y 


f 


y 


n 


a 


k 






41 


42 


43 


44 


45 


46 


47 


48 


49 






ATC 


CAG 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 






E 


g 


L 


c 


q 


t 


f 


S 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GAG 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


TCG 


TAC 


GGT 


GGT 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



Esp I | 



s 


W 


e 


d 


c 


m 


r 


t 


C 


g 


70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


TCG 


TGG 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 










1 Sph ] 


Li 









g 


a 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 
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Table 39: vgDNA to vary set#2 BPTI 2.2 

+ 





g 


P 


c 


x 


a 


D 




35 


36 


37 


38 


39 


40 


ccr crca cac 


GGG 


CCC 


TGC 


mrA 


GCG 


GAT 


| spacer 


Apa I 





+ + + 



X 


Q 


X 


X 


f 


y 


n 


a 


k 


41 


42 


43 


44 


45 


46 


47 


48 


49 


rwA 


CAG 


rvk 


TwT 


TTC 


TAC 


AAC 


GCT 


AAA 



+ + + 



E 


X 


L 


c 


X 


X 


f 


S 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GAG 


qfk 


CTG 


TGC 


qfk 


qfk 


TTT 


TCG 


TAC 


GGT 


GGT 



31 nts olig#30 3 1 - g cca cca 



Overlap = 15 (11 CG, 4 AT) 



/- 3 f olig#29 94 nts 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



acg gca cga ttc gca ttg ttg aaa ttt 
1 ESP I L 



+ 



s 


W 


X 


d 


c 


m 


70 


71 


72 


73 


74 


75 


TCG 


TGG 


qfk 


GAT 


TGC 


ATG 



age acc **m eta acg tac gcg acc tgc -5' 

| Soh I | spacer | 

k = equal parts of T and G; v = equal parts of C, A, and G; 

m = equal parts of C-and A? r = equal parts of A and G; 

w = equal parts of A and T; 

q - (.26 T, .18 C, .26 A, and .30 G) ; 

f » (.22 T, .16 C ; .40 A, and .22 G) ; 

* = complement of symbol above 



Residue 38 41 43 44 51 54 55 72 

Possibilities 4x 4x 9x 2x21x21x21x21 

= 6.2 x 10 7 

Abundance X 10 2.5 2.5 .833 5. .663 .397 .437 .602 
Product = 2.3 x 10" 8 

Parent = 1/(4.4 x 10 7 ) least favored = 1/(1.25 x 10 9 ) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.2 x 10 7 
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Table 40: Result of varying set#2 of BPTI 2.2 



1 

29 
CTC 
Xho 



30 
GAG 
I 



178 



p 


P 


y 


t 


g 


P 


c 


£ 


a 


D 




31 


32 


33 


34 


35 


36 


37 


38 


39 


40 




CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


GAG 


GCG 


GAT 






P: 


eiM i 




1 




















J 


Ar>a I 


L 










V 


Q 


N 


F 


f 


y- 


n 


a 


k 






41 


42 


43 


44 


45 


46 


47 


48 


49 






GTT 


CAG 


AAT 


TTT 


TTC 


TAC 


AAC 


GCT 


AAA 






E 


F 


L 


c 


S 


A 


f 


S 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GAG 


TTT 


CTG 


TGC 


TCT 


GCT 


TTT 


TCG 


TAC 


GGT 


GGT 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



Esp I | 



s 


W 


Q 


d 


c 


m 


r 


t 


c 


g 


70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


TCG 


TGG 


CAG 


GAT 


TGC 


ATG 


CGT 


ACC 


-TGC 


GGT 










1 Sr>h ] 


a 









g 


a 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 
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Table 41: vg DNA set#2 of BPTI 2.3 





1 


e 




29 


30 


ccr aac eta 


CTC 


GAG 


1 spacer 


Xho I 



178 



p 


X 


y 


X 


g 


P 


c 


E 


a 


X 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCG 


vma 


TAT 


vma 


GGG 


CCC 


TGC 


GAG 


GCG 


qfk 


V 


Q 


N 


+ 
X 


f 


y 


n 


a 


k 




41 


42 


43 


44 


45 


46 


47 


- 48 


49 




GTT 


GAG 


AAT 


Tdk 


TTC 


TAC 


AAC 


GCC 


AAq 


-3' 



67 nts olig#34 3»- g atg ttg egg ttc 
Overlap =13 (7 CG, 6 AT) 



+ 
X 
50 
vAG 



F 

51 
TTT 



+ 
X 
52 
nTk 



c 
53 
TGC 



S 

54 
TCT 



+ 
X 
55 
qfk 



f 

56 
TTT 



+ 
X 
57 
qfk 



y 

58 
TAC 



g 

59 
GGT 



208 



olig#33 71 nts 



g 

60 
GGT 



268 



btc aaa nam acg aga **m aaa **m atg cca cca 



c 


r 


a 


k 




61 


62 


63 


64 




TGC 


CGT 


GCT 


AAG 


C 



acg gca cga ttc gcg acc ggc 
| Esp I 1 spacer [ 

k « equal parts of T and G; m = equal parts of C and A; 
w = equal parts of A and T; n = equal parts of A,C,G,T; 
d = equal parts A,G,T; v = equal parts A,C,G; 

q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 
* = complement of symbol above 

Residue 32 34 40 44 50 52 55 57 

Possibilities 6x 6 x 21 x 6x 3x 5x21x21 = 

3 x 10 7 

Abundance x 10 

Of PPBD 10/6 10/6 .545 10/6 10/3 30/8 .459 .701 

product = 1.01 x 10" 7 

parent = 1/(1 x 10 7 ) least favored = 1/(4 x 10 8 ) 

Least favored one-amino-acid substitution from PPBD present 

at 1 in 3 x 10 7 
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Table 42: Result of varying set#2 of BPTI 2,3 



1 


e 


29 


30 


CTC 


GAG 


Ava I 


Xho I 



P 


E 


y 


Q 


g 


P 


c 


E 


a 


A 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCG 


GAG 


TAT 


CAG 


GGG 


CCC 


TGC 


GAG 


GCG 


GCT 










Ana I 










V 


Q 


N 


W 


f 


y 


11 


a 


k 




41 


42 


43 


44 


45 


46 


47 


48 


49 




GTT 


CAG 


AAT 


TGG 


TTC 


TAC 


AAC 


GCT 


AAA 





Q 


F 


M 


c 


S 


L 


f 


H 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


CAG 


TTT 


ATG 


TGC 


TCT 


CTT 


TTT 


CAT 


TAC 


GGT 


GGT 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



Esp I I 



s 


W 


Q 


d 


c 


m 


r 


t 


c 


g 


70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


TCG 


TGG 


CAG 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 










1 Sph ] 


Li 









g 


a 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 



n 



m 
a 
m 

H 

a> 
u 
u 

Q) 
Q) 



O 

in 



CO o 



w X 
o c 

O i4 



U 

1 



o w 

n o 



c 

•H 

ft 

M 
-P 



CO 
O 



in 
«• 

co co 

CO 

I 

W ^ H ^ 
CO H W 

I OJ 
CO - o 
in co ^ 
in vo 
1 

in h 



cd 
i 



c 

•H 

c 

•H 
-P 
O 




! 



vo 



td 

o 



to 

2 

* 

tn 
o 



s 



1 





CO 


55 (d 










rH 


0 




t— 1 


c 


a) 


(d 




id 


& 




N 




S3 




CO 


u 


M 






0 


iH 
rH 


cd 




«5 






a 





in 




w 


CO 


w 


W 


10 




a> 


0) 


Q) 


0 




a) 






>i 


>i 


>1 


>i 





CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 







CO 


vo 






o 


CM 


H 


in 


CM 


o 










H 


H 


H 








< 










C 


O 


0) 








-H 


10 


W 








td 


<d 


cd 










a> 


0) 




H 




O -H 


rH 


H 




H 




T> O 


O 


o 


R 


H 




O 








1 


< 


*3 3 


c 






H 


H 


M g 


o 


o 


o 




1 


•H O 


XI 




CO 


i 




XI > 


•H 


-H 


& 




CO 


e O 


P4 


P4 





X* 



-Eh 

CO W 

*• « 

3 CO 
U H 







0~ 










& 
















>1 




>1 




>1 


cd 


cd 


cd 


td 


cd 


U 


u 


u 


u 


u 


! 


i 


i 


i 


i 


X 


X 


X 


X 


X 



03 
CN 
H 



< 



co o 



<u 

•P 
0) 

o 
o 

CO 



co to 
co M 
o c 



MOW 

a) c 'O 

.Q-H -H 

3 *c *c 











0 


O 


0 




-p 


«P 


-P 




Pi 


ft 


ft 


CO 


a> 


a) 


a> 


•8 


0 


0 


a 




a> 


Q) 


a> 








PS 





CO 
(H 
-H 

fd 

c 

CO 
CO 

c 
o 
a 



CO 

c 

X 

o 
-p 
o 

o 
u 
I 



CO 
iH 
-H 

fd 
a 

CO 
CO 

c 
o 
a 



s 

55 



in 

CM 
I 

O 
CM 



CO 

C 

•r-l 

X 

o 
-p 
o 
c 
o 
o 
I 



CO 
iH 
•H 

fd 
c 
to 

CO 

c 
o 
u 



o 
co 
! 

in 

CM 



10 

c 

•H 
X 

o 
-p 
o 
c 
o 
o 
I 

a 



CO 
iH 

fd 



CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CM 


CO 


CO 


CO 



o 

CO 

i 

in 

CM 



C CO 

0 <d 

1 -H 

O^-P 

c a 
•h a) 
« ft 



Q 



CO 


CO 


CO 


CO 


CO 


a) 


0 


a) 


0) 


0 


>i 


>i 


>« 


>1 


>i 



>1 
fd 



<d 
o 
o 
o 
o 
o 

0) H 
CO >i 

fd & 
0) ft 
h fd 

3 CO 
$5 ^ 



■P M 

c o) 

o c 

•a c 

C fd 

m fto 

b © 
fd 

a i « 



e 


10 




CO 




to 




*H 


S3 


CO 




M 








c 


(d 




o 




CD 


a 


col A 



9 



3 03 



co ^ 

c 

•H O 
X -P 
O 

-P C 

o o 
A ft 

M o 
(d o 

43 CO 



CO 



fd ft 
a) 



CO 
0) 
0) 
CQ 



CO 
O 
>* 





CO 










CD 










>i 










CO 




CO 






CO 










1 


in 






QJ 


CO 


CO 


in 






H 


1 




CO ~ 


0 






CM 


co 


c 


CO - 


H 




CM 




CO 00 






CM 




CM 










CO 1 




•* 










H 


CO 



CM 



& 
O 

c 
a 

c > 
e o 

cd 0 
ft£t 
< ^ 



Q 

m 

m 

W 
m 

a 

a 

ni 
« 
a 



vo 

ON 



a) 



to 

5 



o 

O 
0) 



o 

a) 
u 
o 

CO 



o 
-p 

•H 
.0 
•H 

•H 

•H 
01 
ft 
>i 
M 
■P 

C 
(d 
Q) 

CO 



o 

3 



H 

g 

CO 

c 

<d 



vo 

CO 

S 

o 



0 
+> 
*H 

.Q 

■s 

M 

a> 
to 
id 
a> 

-P 

o 
u 
cu 

<U 
-p 
>t 
o 
o 

0) 

>1 
o 

-P 
0) 

O 

© 

CO 



H 

a 

CO 



o 

P3 
CO 



CO 

8 



eft 
0) 

o 

H 
O 

O 

•a 
c 

c 

-H 

-P 

td 

W 
>i 

u 



in 

CO 

o 



c 

-H 



CM 
CO 



CO 



45 

CO 

Pi 

s 

a 



id 

CO 



a 



o 

+3 
■H 

JQ 

•H 

JCJ 

C 

-H 

>t 
0) 

rH 
M 

td 

CQ 
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Table 101a: VTIIsianal; :bt?ti: ;VIII-coat gene 
pbd modl4: 9 V 89 : Sequence cloned into pGEM-MBl 
pGEM-3Zf (-) [Hindi] : : lacUVS Sad/gene/ 
TrpA attenuator/XSall) : :pGEM-3Zf (-) [Hindi] ! 

5 

5 '-(GAATTC GAGCTCGGTACCCGG GGATCC TCTAGAGTC) - ! poly linker 
GGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! lacUVS 
TGG aATTGTGAGCGcTcACAATT ! lacO-symm operator 

aaactc AG ( G} AGG CttaCT ! Sac I; Shine-Dalgarno seq. a 



10 


atg 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age ! 


10, M13 leader 




gtt 


get 


gtc 


gcg 


ace 


ctg 


gta 


cct 


atg 


ttg ! 


20 <- codon # 




tec 


ttc 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag ! 


30 




cca 


cca 


tac 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc ! 


40 




ate 


ate 


cgC 


tat 


ttc 


tac 


aat 


get 


aaa 


gca 1 


50 


15 


ggc 


ctg 


tgc 


cag 


ace 


ttt 


gta 


tac 


ggt 


ggt ! 


60 




tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg . 


70 




gec 


gaa 


gat 


tgc 


atg 


cgt 


ace 


tgc 


ggt 


ggc . 


. 80 




gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gec 


aaG 


gcg , 


. 90 




gec 


ttc 


aat 


tct 


ctG 


caa 


get 


tct 


get 


acc , 


> 100 


20 


gag 


tat 


att 


ggt 


tac 


gcg 


tgg 


gee 


atg 


gtg , 


. 110 




gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggg 


ate 


. 120 




aaa 


ctg 


ttc 


aag 


aag 


ttt 


act 


teg 


aag 


gcg 


! 130 




tct 


taa 


tga 


tag 


GGTTACC 


i 
• 


BstEII 







AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT ! terminator 
25 aTCGA - ! ( Sai l ghost) 

( GACCTGCAGGCATGC AAGCTT ... -3 1 ) ! pGEM polylinker 

Notes : 

a Designed sequence contained AGGAGG, but sequencing 
30 indicates that actual DNA contains AGAGG. 
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Table 101b: VHI-sicmal: :bt>ti : iVIII-coat gene 
BamHI-Sall cassette, after insertion of Sail linker 
in PstI site of pGEM-MBl. 
pGEM-3Zf (-) [Hindi 1 : :lacUV5 Sad/gene/ 
5 TroA attenuator/ISall) ::pGEM-3Zf (-) [Hindi]! 

5 1 -GAATTC GAGCTC GGTACCCGG GGATCC TCTAGA GTC- ! BainHI 
GGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! lacUVS 
TGG aATTGTGAGCGcTcACAATT ! lacO-symm operator 

aaactc AGAGG CttaCT ! Sac I; Shine-Dalgarno seq. 





10 


atg 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age ! 


10 , M13 leader 






gtt 


get 


gtc 


gcg 


ace 


ctg 


gta 


cct 


atg 


ttg ! 


20 <- codon # 






tec 


ttc 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag ! 


30 






cca 


cca 


tac 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc ! 


40 






ate 


ate 


cgC 


tat 


ttc 


tac 


aat 


get 


aaa 


gca I 


50 




15 


ggc 


ctg 


tgc 


cag 


ace 


ttt 


gta 


tac 


ggt 


ggt J 


60 






tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg ! 


70 


OB 




gec 


gaa 


gat 


tgc 


atg 


cgt 


ace 


tgc 


ggt 


ggc ! 


. 80 






gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gee 


aaG 


gcg ! 


> 90 






gec 


ttc 


aat 


tct 


ctG 


caa 


get 


tct 


get 


acc , 


. 100 




20 


gag 


tat 


att 


ggt 


tac 


gcg 


tgg 


gec 


atg 


gtg - 


. 110 






gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggg 


ate , 


[ 120 


H \ 




aaa 


ctg 


ttc 


aag 


aag 


ttt 


act 


teg 


aag 


gcg , 


[ 130 






tct 


taa 


tga 


tag 


GGTTACC 


j 


BstEII 







AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT ! terminator 
25 aTCGA GACctgca GGTCGACC ggcatgc-3 1 

| Sail j 
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Table 102a: Annotated Sequence of gene 
found in pGEM-MBl 



5'-(G GATCC TCTAGA GTC) GGC- 
from pGEM polyl inker 



nucleotide 
number 



10 



tttaca CTTTATGCTTCCGGCTCG tataat GTGTGG- 
-35 lacUV5 -10 



39 



15 aATTGTGAGCGcTcACAATT - 
lacO-symm operator 



59 



20 



aaactc AGfG) AGG 

SacI Shine-Dalgarno seq. a 



CttaCT- 



77 



25 



30 



35 



45 



fM 


K 


K 


S 


L 


1 


2 


3 


4 


5 


ATG 


AAG 


AAA 


TCT 


CTG 


V 


A 


V 


A 


T 


11 


12 


13 


14 


15 


GTT 


GCT 


GTC 


GCG 


ACC 






1 Nru ] 


a 


S 


F 


A 


R 


p 


21 


22 


23 


24 


25 


TCC 


TTC 


GCT 


CGT 


CCG 








|AccI3 



V 
6 
GTT 



L 
7 
CTT 
Afl 



K 
8 
AAG 
II 



A 
9 
GCT 
Nhe 



S 

10 
AGC 
I 



Kpn 1 1 



40 M13/BPTI Jnct 



D 
26 
GAT 



P 
31 



p 


Y 


T 


G 


P 




32 
CCA 


33 
TAC 


34 
ACT 


35 
GGG 


36 
CCC 


T 


Pf 1M I 


1 












Aoa I 










Dra II 








Pss I 



F 
27 
TTC 



C 
37 



C 
28 
TGT 



K 

38 
AAA 



L 


E 


29 


30 


CTC 


GAG 


Ava I 


Xho I 


A 


R 


39 


40 


GCG 


CGC 


BssH II 



107 



L 


V 


P 


M 


L 




16 


17 


18 


19 


20 




CTG 


GTA 


CCT 


ATG 


TTG 


137 



167 



197 



400 



Table 102a : Annotated Sequence 
of gene found in pGEM-MBl 
(continued) 

5 



10 

A 



I 


I 


R 


Y 


F 


Y 


N 


A 


K 


A 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


ATC 


ATC 


CGC 


TAT 


TTC 


TAC 


AAT 


GCT 


AAA 


GC 


G 


L 


C 


Q 


T 


F 


V 


Y 


G 


G 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


Stu 


it 










Acc I 


















Xca I 









C 


R 


A 


K 


R 


N 


N 


F 


K 








61 


62 


63 


64 


65 


66 


67 


68 


69 








TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 




284 








ESD I 


a 
















S 


A 


E 


D 


c 


M 


R 


T 


C 


G 






70 


71 


72 


73 


74 


75 


76 


77 


78 


79 




01 25 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 


314 



30 



35 



40 



IXmall 



I Sph I] 



BPTI/M13 boundary 



G 


A 


r 

A 


E 


G 


D 


D 


P 


A 


K 


A 


A 


80 


81 


82 


83 


84 


85 


86 


87 


88 


89 


90 


91 


GGC 


GCC 


GCT 


GAA 


GGT 


GAT 


GAT 


CCG 


GCC 


AAG 


GCG 


GCC 


Bbe I 














Sj 


:i I 






Nar I 






















F 


N 


S 


L 


Q 


A 


S 


A 


T 








92 


93 


94 


95 


96 


97 


98 


99 


100 








TTC 


AAT 


TCT 


CTG 


CAA 


GCT 


TCT 


GCT 


ACC 






37*: 



- 350 



[Hind 3 1 



E 


Y 


I 


G 


Y 


A 


W 


101 


102 


103 


104 


105 


106 


107 


GAG 


TAT 


ATT 


GGT 


TAC 


GCG 


TGG 



A 


M 


V 


V 


V 


I 


V 


G 


A 


108 


109 


110 


111 


112 


113 


114 


115 


116 


GCC 


ATG 


GTG 


GTG 


GTT 


ATC 


GTT 


GGT 


GCT 



BstX I 



Nco 1 1 



401 



Table 102a : Annotated Sequence 
of gene found in pGEM-MBl 
(continued) 



T 


I 


6 


I 


117 


118 


119 


120 


ACC 


ATC 


GGG 


ATC 



437 



10 



15 



20 



K 


L 


F 


K 


K 


F 


T 


s 


K 


A 




121 


122 


123 


124 


125 


126 


127 


128 


129 


130 




AAA 


CTG 


TTC 


AAG 


AAG 


TTT 


ACT 


TCG 


AAG 


GCG 


467 














lAsu ] 


ELL 






S 


• 


• 


• 
















131 


132 


133 


134 
















TCT 


TAA 


TGA 


TAG 


GGTTACC- 








486 



BstE II 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



521 



25 



aTCGA ( GACc tgcaggca tgc ) - 3 ' 
( Sai l ) from pGEM poly linker 



30 Notes: 

a Designed called for Shine-Dalgarno sequence, AGGAGG, 
but sequencing shows that actual constructed gene contains 
AGAGG. 

35 

Note the following enzyme equivalences, 



Xma III 
Dra II 

40 



= Eag I 

= ECQO109 I 



Ace III 
Asu II 



= BspM II 
= BstB I 



402 



Table 102b ; Annotated Sequence of gene 
after insertion of Sai l linker 



nucleotide 
number 



10 



5 f -(GGATCC TCTAGA GTC) GGC- 
from pGEM poly linker 

tttaca CTTTATGCTTCCGGCTCG tataat GTGTGG- 
»35 lacUVS -10 



39 



15 aATTGTGAGCGcTcACAATT - 
lacO-syiam operator 



59 



20 



aaactc AGAGG CttaCT- 

SacI Shine-Dalgarno seq. 



77 





fM 


K 


K 


S 


L 


V 


L 


K 


A 


S 




25 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 






ATG 


AAG 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGC 


107 
















Afl 


. II 


Nhe I 





30 


V 


A 


V 


A 


T 


L 


V 


P 


M 


L 




11 


12 


13 


14 


15 


16 


17 


18 


19 


20 




GTT 


GCT 


GTC 


GCG 


ACC 


CTG 


GTA 


CCT 


ATG 


TTG 



Nru 



Kpn 1 1 



137 



35 



45 



S 


F 


21 


22 


TCC 


TTC 



A 
23 
GCT 



R 
24 
CGT 



P 
25 
CCG 



40 M13/BPTI Jnct 



AccIII | 



D 
26 
GAT 



P 
31 



p 


Y 


T 


G 


P 




32 


33 


34 


35 


36 




CCA 


TAC 


ACT 


GGG 


CCC 


T 


Pf 1M I 


1 












Ana I 








Dra II 




Pss I 



F 
27 
TTC 



C 
37 



C 
28 
TGT 



K 

38 
AAA 



L 


E 


29 


30 


CTC 


GAG 


Ava I 


Xho I 


A 


R 


39 


40 


GCG 


CGC 


BssH II 



167 



197 



403 



Table 102b : Annotated Sequence 
of gene after insertion of Sai l linker 
(continued) 



I 


I 


R 


Y 


F 


y 


N 


A 


K 


A 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


ATC 


ATC 


CGC 


TAT 


TTC 


TAG 


AAT 


GCT 


AAA 


GC 



15 





G 


L 


c 


Q 


T 


F 


V 


Y 


G 


G 




51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


A 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


1 


stu 












Acc I 


















Xca I 







20 



30 



35 



40 



C 

61 
TGC 



R 
62 
CGT 



A 
63 
GCT 
ESP 



K 
64 
AAG 
I 



R 
65 
CGT 



N 
66 
AAC 





S 


A 


E 


D 


C 




70 


71 


72 


73 


74 


25 


TCG 


GCC 


GAA 


GAT 


TGC 



IXmalll 



M 
75 
ATG 
Sph 



N 
67 
AAC 



R 
76 
CGT 





V 


G 


A 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 


F 


N 


92 


93 


TTC 


AAT 



BPTI/M13 boundary 



A 
82 
GCT 



S 

94 
TCT 



E 
83 
GAA 



L 
95 
CTG 



G 
84 
GGT 



Q 
96 
CAA 



D 
85 
GAT 



A 
97 
GCT 



D 
86 
GAT 



S 

98 
TCT 



F 


K 




68 


69 




TTT 


AAA 




T 


C 


G 


77 


78 


79 


ACC 


TGC 


GGT 


P 


A 


K 


87 


88 


89 


CCG 


GCC 


AAG 


J- 


SI 


:i I 


A 


T 




99 


100 




GCT 


ACC 





A 
90 
GCG 



284 



314 



A 
91 
GCC 



- 350 



377 



IHind 3 1 



45 



E 


Y 


I 


G 


Y 


A 


W 


101 


102 


103 


104 


105 


106 


107 


GAG 


TAT 


ATT 


GGT 


TAC 


GCG 


TGG 



398 





A 


M 


V 


V 


V 


I 


V 


G 


A 




108 


109 


110 


111 


112 


113 


114 


115 


116 


50 


GCC 


ATG 


GTG 


GTG 


GTT 


ATC 


GTT 


GGT 


GCT 



BstX I 
Nco II 



425 



404 



Table 102b: Annotated Sequence 
after insertion of Sai l linker 
(continued) 



T 


I" 


G 


I 


117 


118 


119 


120 


ACC 


ATC 


GGG 


ATC 



437 



10 



15 



20 



K 


L 


F 


K 


K 


F 


T 


S 


K 


A 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


AAA 


CTG 


TTC 


AAG 


AAG 


TTT 


ACT 


TCG 


AAG 


GCG 














lAsu ] 


Cli 




S 




















131 


132 


133 


134 














TCT 


TAA 


TGA 


TAG 


GGTTACC- 









467 



486 



BstE II 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



521 



25 



aTCGA GACctgca GGTCGACC ggcatgc-3 1 

I sail 1 



Note the following enzyme equivalences, 

30 

Xma III = Eag I Acc III - BspM II 

Dra II = ECQO109 I Asu II = BstB I 



35 



405 



Table 102 : Annotated Sequence 
of osp-iobd gene 
(continued) 

Table 102c: Calculated properties of Peptide 
For the apoprotein 



Molecular weight of peptide 
10 Charge on peptide 
[A+G+P] 

[ C+F+H+I+L+M+V+W+Y ] 
[D+E+K+R+N+Q+S+T+. ] 

15 For the mature protein 

Molecular weight of peptide 
Charge on peptide 
[A+G+P] 
2 0 [ C+F+H+I+L+M+V+W+Y ] 
[D+E+K+R+N+Q+S+T+, ] 



16192 
9 
36 
48 
48 



13339 
6 
31 
37 
41 



Table 102d: Codon Usage 



25 



30 



35 



40 



45 



Second Base 



First 

Base 

t 



3 
5 
0 
1 

1 
1 
0 
5 

1 
5 
0 
4 

4 
1 
2 
2 



4 
1 
0 
2 

1 
1 
2 
2 

2 
5 
0 
0 

9 
5 
1 
5 



2 
4 
0 
0 

0 
0 

1 
1 

2 
2 
5 
7 

4 
0 
2 
2 



JL. 
1 
5 
0 
1 

4 
2 
0 
0 

0 
1 
0 
0 

6 
2 
0 
2 



Third base 

t 

c 

a 

g 

t 

c 
a 

g 
t 

c 
a 

g 
t 

c 
a 

g 



406 



Table 102e: Amino-acid frequency 
Encoded polypeptide 



AA 


# 


AA 


# 


C 


6 


D 


4 


G 


10 


H 


0 


L 


8 


M 


4 


Q 


2 


R 


6 


V 


9 


W 


1 




Mature protein 




AA 


# 


AA 


# 


C 


6 


D 


4 


G 


10 


H 


0 


L 


4 


M 


2 


Q 


2 


R 


6 


V 


5 


W 


1 



407 



Table 102 f : Enzymes used to manipulate BPTI-gp8 fusion 



Sad 
Aflll 
5 Nhe l 
Nrul 
Kpn l 

AccIII = BsgMII 
Ava l 
10 Xho l 
PflMI 
BssHII 
Apa l 

Dra ll = Ecol09I 
15 StuI 

AccI 

Xcal 

Esp l 

Xmalll 
20 Sr>h l 

Bbe l 

Narl 

Sfi l 

Hindlll 
25 BstXI 

Ncol 

Asu II = Bst BI 

BstEII 

Sail 



GAGCT | C 

C | TTAAG 

G | CTAG C 

TCGJ.CGA 

G GTAC | C 

T | CCGGA 

C | vCGr G 

C | TCGA G 

CCAn nnn | nTGG 

G | CGCG C 

G GGCC | C 

r GGnC | Cy (Same as PssI) 

AGGjCCT 

GT | mkA C 

gtaX^tac 

GC | TnA GC 

C 1 GGCC G (Supplier ?) 
G CATG | C 

G GCGC | C (Supplier ?) 

GGCG | CC 

GGCC nnnn | n GGCC 

A | AGCTT 

CCA nnnnn | n TGG 

C | CATG G 

TT | CGAA 

G | GTnAC C 

GlTCGAC 



408 

Table 103 : Annotated Sequence of osp-ipbd gene 

Underscored bases indicate sites of overlap between 
annealed synthetic duplexes. 



5'- 

/GGC tttaca CTTTAT , GCTTCCGGCTCG tataat GTGTGG- 
10 lacUVS 



aATTGTGAGCGcTcACAATT- 
lacO-symm operator 



15 



gagctc 
Sac I 



AG(G1 /AGG 



CttaCT- 



Shine-Dalgarno seq. 



20 



25 



fM 


K 


K 


S 


L 


V 


L 


K 


A 


S 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


ATG 


AAG, 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGO 














Af] 


. II 


Nhe I 



30 



V 


A 


V 


A 


T 


L 


V 


P 


M 


L 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


GTT 


GCT 


GTC 


GCG 


ACC 


CTG 


GTA 


CCT 


ATG 


T/TG 



Nru I 



Kpn I] 



35 


S 

21 
TCC 


F 
22 
TTC 


A 
23 
GCT 


] 

CG 









R 
24 



P 
25 
CCG 



lAccIII 



D 
26 
GAT 



M13/BPTI Jnct 



F 


C 


L 


E 


27 


28 


29 


30 


TTC 


TGT 


CTC 


GAG 






Ava I 






Xho I 



409 

Table 103 : Annotated Sequence 
of osp-ipbd gene 
(continued) 

5 



p 


P 


Y 


T 


G 


P 




C 


K 


A 


R 


31 
CCA 


32 
CCA 


33 
TAC 


34 
ACT 


35 
GGG 


36 
CCC 


37 
TGC 


38 
AAA 


39 
GCG 


40 
CGC 




Pf 1M I 


1 










BssH II 










Aoa I 




















Dra II 


















Pss I 











I 


I 


R 


Y 


F 


Y 


N 


A 


K 


A 


41 


42 


43 


44 


45 


46 


47 


43 


49 


50 


ATC 


ATC 


CG/C 


TAT 


TTC 


TAC 


AAT 


GC,T 


AAA 


GC 





G 


L 


C 


Q 


T 


F 


V 


Y 


G 


G 


20 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


A 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


i- 


Stu 


■il 










Acc I 




















Xca I 







25 



30 



35 



40 



C 
61 
TGC 



R 
62 
CGT 



A 
63 
GCT 
Esp 



K 


R 


N 


N 


F 


K 


64 


65 


66 


67 


68 


69 


AAG 


CGT 


/AAC 


AAC 


TTT 


AAA 





S 


A 


E 


D 


C 


M 


R 


T 


C 


G 




70 


71 


72 


73 


74 


75 


76 


77 


78 


79 




TCG, 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 



Ixma III! 



Sph 1 1 



♦ 




G 


A 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 



BPTI/M13 boundary 



A 
82 
GCT 



E 


G 


D 


D 


P 


A 


K 


A 


A 


83 


84 


85 


86 


87 


88 


89 


90 


91 


GAA 


GGT 


GAT 


GAT 


CCG 


GCC 


AAG 


GCG 


G/CC 



Sfi I 



45 





F 


N 


S 


L 


Q 


A 


S 


A 


T 




92 


93 


94 


95 


96 


97 


98 


99 


100 




TTC 


AAT 


TCT 


CTG 


C,AA 


GCT 


TCT 


GCT 


ACC 



Hind 3 
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Table 103 : Annotated Sequence 
of osp-ipbd gene 
(continued) 



E 


Y 


I 


G 


y 


A 


W 


101 


102 


103 


104 


105 


106 


107 


GAG 


TAT 


ATT 


GGT 


TAG 


GCG 


TGG 



10 



15 



A 


M 


V 


V 


V 


I 


V 


G 


A 


108 


109 


110 


111 


112 


113 


114 


115 


116 


GCC 


ATG 


GTG 


GTG 


GTT 


AT/C 


GTT 


GGT 


GCT 



BstX I 



Nco II 



20 



25 



30 





T 


I 


G 


I 




117 


118 


119 


120 




ACC, 


ATC 


GGG 


ATC 



K 


L 


F 


K 


K 


F 


T 


S 


K 


A 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


AAA 


CTG 


TTC 


AAG 


AAG 


TTT 


ACT 


TCG 


AAG 


GCG 














lAsu 1 






S 


* 


• 


• 














131 


132 


133 


134 














TCT 


TAA 


TGA 


TAG 


GGTTA/CC- 









BstE II 



AGTCTA AGCCC .GC CTAATGA GCGGGCT TTTTTTTT- 
-terminator . 



35 



a/CTCGA) . -3 ' 
( Sal I) 



40 



(d 

W O • 

W tfl C 

<d <D td 

' u u 

u +> 

o m 



0) o 
4J 





-P <D 


<D 


<H O XS 




0 c +> 


•H 


o 


•P 


co in 


0 


^ QJ 0 


<D 


0 0 


H 


0 +> 


U 


H 










0 


(d 


tn 


♦ u 


-H 


Q) -P 0 


H 




0 


•H M 




H <D 0 


<u 


c to 


0 


tr» o c 




C *H 


-p 


•H C 


c 


>-H W 




0 •«-( 




H ^ 


g 


rH <d = 




o a) i 




<H U 1 


H 




ed 






<d - 








one 


<d 


a) td 






£ 




o 


3 io 




c , 


.p 




•H 




C 


c c x: 


•H 


O -H +> 


<W 


OH O 


<D 




Q 


o) (d a> 




M X2 




<d G -P 




•H 


o 


= X! c 


H 


1 -P-H 




s -H 


0) 




H 




43 


(d 


(0 













C A 
•H 

a> -p 

a) *h 
c h 

•H 0 
h3 



in m 



{J ft 



1 



O O 



04 
H 



CO 

<u 

•H 

•P 
O 

o 

o 

c 
o 
tr 

-H 

H 

o 

m 
o 

-P 
G 

e tj 

c <1) 

•H C 
H-H 
<d -P 
C 

T? O 
£ O 
(0 ^ 

c 
o 
*H 

•P 
•H 

C 
•H 
<M 

d) 
Q 



O 
H 



.a 
id 

Eh 



I 

8 

U 
O 
En 
Eh 
O 

a 

£* 
V) 
I 

O 

a 
< 

% 

eh 
o 

EH 

FH 
1 

a 
< 

a 
o 

o 



a 
o 



01 

a) 



vo _ 

° g 
CO Eh 
=*= I 
O 
tPEn 
-H Eh 
H Eh 

°£ 

-h>U 



0 r- 

0 * 

o 
o 
-p 

0 
O 

•P 
-P 
I 

-P 

(d 



in 

rH 
CO 

0> 



o 
as 
-P 



« 

i 

tn 
<d 
«J 
<d 
•P 

f 

I 

O 0 



8 
9 



2 



<-P 

(6 



Eh 

a 
i 

o 

Eh 

a 

Eh 
Eh 
O 



0) 
(A 
td 

r- 

vo _ 

— u 

Eh 

vo a 

O 

CO Eh 

~2 

0>O 

-H I 
•H EH 
O EH 

u 



0 *Q 



vo 



0 
I 

o 

td 
0 
<d 

(d 
0 



CO 

H 

CO 

=»= 
0* 

-H 
H 
O 



0u 
I 

0 
td 
0t 

& 

td 
td 

I 

3 

I 
I 

O 

& 

o 



u 
ed 



§ 

o 

EH 



<J -p 

O O 

*j «! 

-P 
o o 
~ td 
o 

-P 

0 

td 



i 

td 
-P 
<d 
i 

*d 
u 
u 
td 
o 
o 

(d 

o 
o 
ed 
i 

•P 

§ 

o 
o 
o 
td 
o 
0 
o 
i 

0 

(d 
td 
o 
O 
td 
td 
-P 
td 



i 



w 

•H 

-P 
O 

a) 

H 

o 

g 

o 



COO 0 ID 
O kC -P O 



O 

4J 
C 

a) ^ 
c a 
•h a 

«* -P 
G 

T3 O 
C O 
rd ^ 

c 
o 

•H 
4-> 
*H 
C 

Q 



o 

H 

f— i 

JQ 



8 



01 

o 

CM 



CO 



<3 

<D - - rH 

h n in o 

iH ! I 
(WOO 
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Table 107: In vitro transcription/translation 
analysis of vector-encoded 
signal: :BPTI: : mature VIII protein species 

31 kd species a 14.5 led species b 



yi 



No DNA (control) - c 

pGEN-3Zf(-) + 

PGEM-MB16 + 

PGEM-MB20 + + 

10 pGEM-MB26 + + 

■-- pGEM-MB42 + + 

pGEM-MB46 ND ND 

Notes : 

15 a.) pre-beta-lactamase, encoded by the amp (bla) 

gene. 

b. ) pre-BPTI/VIII peptides encoded by the synthetic 
gene and derived constructs. 

c. ) - for absence of product; + for presence of 
20 product; ND for Not Determined. 
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Table 108: Western analysis a of in vivo 
expressed 

signal: :BPTI: : mature VIII protein species 



A) expression in strain XLl-Blue 

signal 14 ♦ 5 kd species* 3 12 kd species 0 

pGEM-3Zf (-) - d 

PGEM-MB16 VIII 

PGEM-MB20 VIII ++ 

PGEM-MB26 VIII +-H- +/" 

pGEM-MB42 phoA ++ + 

B) expression in strain SEF 1 

signal 14.5 kd species* 3 12 kd species 0 

pGEM-MB42 phoA +/~ +++ 



Notes : 

a) Analysis using rabbit anti-BPTI polyclonal 
antibodies and horse-radish-peroxidase-conjugated goat 
anti-rabbit IgG antibody. 

b) pro-BPTI/VIII peptides encoded by the synthetic 
gene and derived constructs* 

c) processed BPTI/VIII peptide encoded by the 
synthetic gene. 

d) not present - 

weakly present +/- 

present + 

strong presence «... ++ 
very strong presence +++ 







1579 


5'-GT 






1611 


TGTTCCTTTC 






1651 


TGTTTAGCAA 




5 


1691 


TCTGGAAAGA 






1731 


TGAGGGTTGT 






1771 


ACTGGTGACG 






1811 


TTGGGCTTGC 






1851 


GGGTGGCGGT 




JL V 


1891 


ACTAAACCTC 






1931 


ATACTTATAT 






1971 


TACTGAGCAA 


Q 




2011 


GAGTCTCAGC 






2051 


GGTTCCGAAA 

%^ XJI JL- JL \^ WamM A 




15 


2091 


CACTGTTACT 

V*1V JLi Vprf JL JL A AX** JL 


: i 5 




2131 


CAGTACACTC 


m ' 




2171 


ACTGGAACGG 


Oi 




2211 


CTTTAATGAG 






2251 


TCGTCTGACC 




2 0 


2291 


GCTCTGGTGG 

XJ JL Vp^ JU V* Xrf JW XJ Xrf 






2331 


CTCTGAGGGT 


nj 




2371 


GGCGGTTCCG 






2411 


ATGAAAAGAT 






2451 


AAATGCCGAT 




25 


2491 


AAACTTGATT 






2531 


ATGGTTTCAT 






2571 


TGGTGCTACT 






2611 


GCTCAAGTCG 






2651 


ATTTCCGTCA 




30 


2691 


ATGTCGCCCT 






2731 


TTTTCTATTG 






2771 


TCTTTGCGTT 






2811 


ATTTTCTACG 






2851 


TAATCATGCC 
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Table 109: M13 gene III 
GAAAAAATTA TTATTCGCAA TTCCTTTAGT 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT 
AACCCCATAC AGAAAATTCA TTTACTAACG 
CGACAAAACT TTAGATCGTT ACGCTAACTA 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT 
AAACTCAGTG TTACGGTACA TGGGTTCCTA 
TATCCCTGAA AATGAGGGTG GTGGCTCTGA 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT 
CTGAGTACGG TGATACACCT ATTCCGGGCT 
CAACCCTCTC GACGGCACTT ATCCGCCTGG 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG 
CTCTTAATAC TTTCATGTTT CAGAATAATA 
TAGGCAGGGG GCATTAACTG TTTATACGGG 
CAAGGCACTG ACCCCGTTAA AACTTATTAC 
CTGTATCATC AAAAGCCATG TATGACGCTT 
TAAATTCAGA GACTGCGCTT TCCATTCTGG 
GATCCATTCG TTTGTGAATA TCAAGGCCAA 
TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 
TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA 
GTGGTGGCTC TGGTTCCGGT GATTTTGATT 
GGCAAACGCT "AATAAGGGGG CTATGACCGA 
GAAAACGCGC TACAGTCTGA CGCTAAAGGC 
CTGTCGCTAC TGATTACGGT GCTGCTATCG 
TGGTGACGTT TCCGGCCTTG CTAATGGTAA 
GGTGATTTTG CTGGCTCTAA TTCCCAAATG 
GTGACGGTGA TAATTCACCT TTAATGAATA 
ATATTTACCT TCCCTCCCTC AATCGGTTGA 
TTTGTCTTTA GCGCTGGTAA ACCATATGAA 
ATTGTGACAA AATAAACTTA TTCCGTGGTG 
TCTTTTATAT GTTGCCACCT TTATGTATGT 
TTTGCTAACA TACTGCGTAA TAAGGAGTCT 
AGTTCTTTTG GGTATTCCGT 
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.Table 110: Introduction of Narl into gene III 

A) Wild-type III , portion encoding the signal peptide 

MKKLLFAI PL 
123456789 10 
1579 5*-GTG AAA AAA TTA TTA TTC GCA ATT CCT TTA 



10 



15 



20 



25 



/ Cleavage site 

VVPFYSHSAET V 
11 12 13 14 15 16 17 18 19 20 21 22 
1609 GTT GTT CCT TTC TAT TCT CAC TCC GCT GAA ACT GTT-3 1 



B) III , portion encoding the signal peptide with Narl 
site 



m 



_kkllfalpl 

12 3456789 10 
1579 5»-gtg aaa aaa tta tta ttc gca att cct tta 



/ cleavage site 

4 

vvpfysGAaetv 
11 12 13 14 15 16 17 18 19 20 21 22 
30 1609 gtt gtt cct ttc tat tct GGc Gcc get gaa act gtt-3 * 
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10 



15 



Table 111: IIIsp : ib p ti; : mautreIII fusion gene. 



mkkllfalpl 
12 3456789 10 
5«-gtg aaa aaa tta tta ttc gca att cct tta 
|< gene III signal peptide 



vvpfysGA 
11 12 13 14 15 16 17 18 
gtt gtt cct ttc tat tct GGc Gcc 
>| 



/ cleavage site 
4" 



20 



R 
19 
CGT 



P 
20 
CCG 
AccI 



D 

21 
GAT 
Ii 



35 



40 



45 



M13/BPTI Jnct 





P 


P 


Y 


T 


G 


P 




25 


26 


27 


28 


29 


30 


31 






CCA 


CCA 


TAC 


ACT 


GGG 


CCC 


T 






PflM I 


1 






30 










Aoa I 






Dra II 




Pss I 



F 
22 
TTC 



C 
32 



C 
23 
TGT 



K 

33 
AAA 



L 


E 


24 


25 


CTC 


GAG 


Ava I 


Xho I 


A 


R 


34 


35 


GCG 


CGC 



1 BssH II 



I 


I 


R 


Y 


F 


Y 


N 


A 


K 


A 


36 


37 


38 


39 


40 


41 


42 


"43 


" 44 


45 


ATC 


ATC 


CGC 


TAT 


TTC 


TAC 


AAT 


GCT 


AAA 


GC 


G 


L 


C 


Q 


T 


F 


V 


Y 


G 


G 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


Stu 


it 










Ace I 


















Xca I 






C 


R 


A 


K 


R 


N 


N 


F 


K 




56 


57 


58 


59 


60 


61 


62 


63 


64 




TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 








Est) I 


I 
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Table 111, continued 



S 

65 
TCG 



A 
66 
GCC 



iXmall 



£ 
67 
GAA 



D 
68 
GAT 



C 
69 
TGC 



M 

70 
ATG 

Snh : 



R 
71 
CGT 



T 
72 
ACC 





V 


G 


A 


75 


76 


GGC 


GCC 


Bbe I 


Nar I 



BPTI/M13 boundary 



C 
73 
TGC 



G 
74 
GGT 



GAaetves 
77 78 79 80 81 82 83 84 
GGc Gcc get gaa act gtt GAA AGT 

1651 TGTTTAGCAA AACCCCATAC AGAAAATTCA TTTACTAACG 

1691 TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA 

1731 TGAGGGTTGT CTGTGGAATG CTACAGGCGT TGTAGTTTGT 

1771 ACTGGTGACG AAACT C AGTG TTACGGTACA TGGGTTCCTA 

1811 TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA 

1851 GGGTGGCGGT TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT 

1891 ACTAAACCTC CTGAGTACGG TGATACACCT ATTCCGGGCT 

1931 ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG 

1971 TACTGAGCAA AACCCCGCTA ATCCTAATCC TTCTCTTGAG 

2011 GAGTCTCAGC CTCTTAATAC TTTCATGTTT CAGAATAATA 

2051 GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG 

2091 CACTGTTACT CAAGGCACTG ACCCCGTTAA AACTTATTAC 

2131 CAGTACACTC CTGT AT CATC AAAAGCCATG TATGACGCTT 

2171 ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG 

2211 CTTTAATGAG GATCCATTCG TTTGTGAATA TCAAGGCCAA 

2251 TCGTCTGACC TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 

2291 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 

2331 CTCTGAGGGT GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA 

2371 GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT GATTTTGATT 

2411 ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA 

2451 AAATGCCGAT GAAAACGCGC TACAGTCTGA CGCTAAAGGC 
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Table 


111, continued 










AAACTTGATT 


CTGTCGCTAC 


rnr* A mrn * r^i^T* 
XunX XA\*V»vaX 






OK1 1 


^ r Pnc p n r P r rr i A f n 

aIsjuX X IwiX 






Uiiinlbvjliui 




2571 


TGGTGCTACT 


GGTGATTTTG 


CTGGCTCTAA 


TTCCCAAATG 


5 


2611 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


TTAATGAATA 




2651 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 




2691 


ATGTCGCCCT 


TTTGTCTTTA 


GCGCTGGTAA 


ACCATATGAA 




2731 


TTTTCTATTG 


ATTGTGACAA 


AATAAACTTA 


TTCCGTGGTG 




2771 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTATGTATGT 


10 


2811 


ATTTTCTACG 


TTTGCTAACA 


TACTGCGTAA 


TAAGGAGTCT 




2851 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 
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Table 112 : Annotated Sequence of 
Ptac: :RBS (GGAGGAAATAAA) : : 
VTII-sianal : s ^ature-bpti : :mature-VIII-coat-protein gene 



S'-GGATCC actccccatcccc 



BamHI 



ctg TTGACA attaatcatcgGCTCG tataat GTGTGG- 
-35 tac -10 



aATTGTGAGCGcTcACAATT - 
lacO-syiroa operator 



gagctc t ggagga aataaa- 

SacI Shine-Dalgarno seq. 



fM 


K 


K 


S 


L 


V 


L 


K 


A 


S 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


ATG 


AAG 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGC 














Af] 


. II 


Nhe I 


V 


A 


V 


A 


T 


L 


V 


P 


M 


L 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


GTT 


GCT 


GTC 


GCG 


ACC 


CTG 


GTA 


CCT 


ATG 


TTG 






1 1 


*ru : 


El 


I 


Kpn 


JO. 







s 

21 
TCC 



F 
22 
TTC 



A 
23 
GCT 



R 
24 
CGT 



P 
25 
CCG 

Acci: 



D 

26 
GAT 
1 1 



M13/BPTI Jnct 



P 

31 



F 
27 
TTC 



p 


Y 


T 


G 


P 




32 


33 


34 


35 


36 




CCA 


TAC 


ACT 


GGG 


CCC 


T 


Pf 1M I 


1 












Apa I 








Dra II 




Pss I 



37 



C 
28 
TGT 



K 

38 
AAA 



L 


E 


29 


30 


CTC 


GAG 


Ava I 


Xho I 


A 


R 


39 


40 


GCG 


CGC 


BSSH II 
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Table 112 : Annotated Sequence of 
Ptac: :RBS ( GGAGGAAATAAA) : : 
VHI-sicrnal ; ; mature-bpti ; t mature-VIII-coat-orotein gene 

(continued) 

5 



I 


I 


R 


Y 


F 


Y 


N 


A 


K 


A 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


ATC 


ATC 


CGC 


TAT 


TTC 


TAC 


AAT 


GCT 


AAA 


GC 



10 





G 


L 


C 


Q 


T 


F 


V 


Y 


G 


G 




51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


A 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


| Stu 


-II 










Acc I 


















Xca I 









C 


R 


A 


K 


R 


N 


N 


F 


K 






61 


62 


63 


64 


65 


66 


67 


68 


69 




20 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 










Eso I 
















S 


A 


E 


D 


C 


M 


R 


T 


C 


G 




70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


25 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 



| Xmalll 



Sph I 



30 



35 



40 







G 


A 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 


F 


N 


92 


93 


TTC 


AAT 



BPTI/M13 boundary 



A 
82 
GCT 



S 

94 
TCT 



E 


G 


D 


D 


P 


A 


K 


A 


A 


83 


84 


85 


86 


87 


88 


89 


90 


91 


GAA 


GGT 


GAT 


GAT 


CCG 


GCC 


AAG 


GCG 


GCC 










4- 


Sj 


Ci 1 






L 


Q 


A 


S 


A 


T 








95 


96 


97 


98 


99 


100 








CTG 


CAA 


GCT 


TCT 


GCT 


ACC 









I Hind 3 1 



E 


Y 


I 


G 


Y 


A 


W 


101 


102 


103 


104 


105 


106 


107 


GAG 


TAT 


ATT 


GGT 


TAC 


GCG 


TGG 





A 


M 


V 


V 


V 


I 


V 


G 


A 




108 


109 


110 


111 


112 


113 


114 


115 


116 


50 


GCC 


ATG 


GTG 


GTG 


GTT 


ATC 


GTT 


GGT 


GCT 



BstX I 
Nco 1 1 
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Table 112 : Annotated Sequence of 
Ptac: :RBS (GGAGGAAATAAA) : : 
VHI-sicrnal : ; mature-bpti : ; mature-VIII-coat-protein gene 

(continued) 

5 



T 


I 


G 


I 


117 


118 


119 


120 


ACC 


ATC 


GGG 


ATC 



10 



K 


L 


F 


K 


K 


F 


T 


S 


K 


A 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


AAA 


CTG 


TTC 


AAG 


AAG 


TTT 


ACT 


TCG 


AAG 


GCG 



[Asu II | 



s 


• 


• 


• 


131 


132 


133 


134 


TCT 


TAA 


TGA 


TAG 



BstE II 
20 

AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 

25 

aTCGA GACctgca GGTCGACC ggcatgc-3 1 

| Sail | 
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Table 113 : Annotated Sequence of 
pGEM-MB42 comprising Ptac: :RBS (GGAGGAAATAAA) : : 
ioA-sicmal ; ; mature-bpti r : mature-VIII-coat-t>rotein 



5*-GGATCC actccccatcccc 



BaraHI 



ctg TTGACA attaatcatcgGCTCG tataat GTGTGG- 
-35 tac -10 



aATTGTGAGCGcTcACAATT - 
lacO-symm operator 



GAGCTCCATGGGAGAAAATAAA 
ISacll 



M 

1 
ATG 
< — 



K 


Q 


S 


T 


2 


3 


4 


5 


AAA 


CAA 


AGC 


ACG 



phoA signal peptide 



I 


A 


L 


L 


P 


L 


L 


F 


T 


P 


V 


T 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


ATC 


GCA 


CTC 


TTA 


CCG 


TTA 


CTG 


TTT 


ACC 


CCT 


GTG 


ACA 



phoA signal continues 



(There are no residues 20-23.) 





K 


A 


R 


P 


D 


F 


C 


L 


E 




18 


19 


24 


25 


26 


27 


28 


29 


30 




AAA 


GCC 


CGT 


CCG 


GAT 


TTC 


TGT 


CTC 


GAG 


phoA signal-> 


lAccIIll 






Ava I 


phoA/BPTI Jnct 










Xho I 



l< 



BPTI insert 



P 
31 



p 


Y 


T 


G 


P 




32 


33 


34 


35 


36 




CCA 


TAC 


ACT 


GGG 


CCC 


T 


Pf 1M I 


1 












Ana I 






Dra II 






Pss I 



C 
37 



K 

38 
AAA 



A 

39 
GCG 



R 

40 
CGC 



BssH II 
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Table 113 : Annotated Sequence of 
Ptac : : RBS (GGAGGAAATAAA) : : 
phoA-sicmal : ; mature-bpti : : mature-VIII-coat-protein gene 

(continued) 



10 



15 



20 



I 


I 


R 


Y 


F 


Y 


N 


A 


K 


A 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


ATC 


ATC 


CGC 


TAT 


TTC 


TAC 


AAT 


GCT 


AAA 


GC 


G 


L 


C 


Q 


T 


F 


V 


Y 


G 


G 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GGT 


GGT 


Stu 












Acc I 


















Xca I 






c 


R 


A 


K 


R 


N 


N 


F 


K 




61 


62 


63 


64 


65 


66 


67 


68 


69 




TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 








Est) I 


1. 















S 


A 


E 


D 


C 


M 


R 


T 


C 


G 




70 


71 


72 


73 


74 


75 


76 


77 


"78 


79 


25 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 




Xmall] 


U. 




I Sph ] 


i\ 









BPTI insert- 



30 








G 


A 




80 


81 




GGC 


GCC 




Bbe I 


35 


Nar I 




BPTI — > 




F 


N 


40 


92 


93 




TTC 


AAT 


45 


E 


Y 




101 


102 




GAG 


TAT 



BPTI/M13 boundary 



A 


E 


G 


D 


D 


P 


A 


K 


A 


82 


83 


84 


85 


86 


87 


88 


89 


90 


GCT 


GAA 


GGT 


GAT 


GAT 


CCG 


GCC 


AAG 


GCG 



Sfi I 



A 
91 
GCC 



mature gene VIII coat protein 



S 

94 
TCT 



I 
103 
ATT 



L 
95 
CTG 



G 
104 
GGT 



Q 
96 
CAA 



A 
97 
GCT 



Hind 



S 
98 
TCT 

al 



A 
99 
GCT 



T 
100 
ACC 



Y 
105 
TAC 



A 
106 
GCG 



W 
107 
TGG 
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Table 113 : Annotated Sequence of 
Ptac: :RBS (GGAGGAAATAAA) : : 
phoA-siqnal: x mature-boti : : mature-VIII-coat-orotein gene 

(continued) 



10 



A 


M 


V 


V 


V 


I 


V 


G 


A 


108 


109 


110 


111 


112 


113 


114 


115 


116 


GCC 


ATG 


GTG 


GTG 


GTT 


ATC 


GTT 


GGT 


GCT 



BstX I 



Nco II 





T 


I 


G 


I 


15 


117 


118 


119 


120 




ACC 


ATC 


GGG 


ATC 



20 



25 



K 


L 


F 


K 


K 


F 


T 


S 


K 


A 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


AAA 


CTG 


TTC 


AAG 


AAG 


TTT 


ACT 


TCG 


AAG 


GCG 














lAsu ] 






S 


• 


• 


• 














131 


132 


133 


134 














TCT 


TAA 


TGA 


TAG 


GGTTACC- 









Bst E II 



30 AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



aTCGA GACctgca GGTCGAC-3 1 
35 1 Sail | 
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Table 114: Neutralization of Phage Titer Using 
Agarose-immobilized Anhydro-Trypsin 

Percent Residual Titer 





Phaae Tvce 


Addition 


1 


2 


4 




MK-BPTI • 


5 [ll IS 


99 


104 


105 






2 Ml IAT 


82 


71 


51 






5 pi IAT 


57 


40 


27 


10 




10 jul IAT 


40 


30 


24 




MK 


5 fil IS 


106 


96 


98 






2 JUl IAT 


97 


103 


95 






5 /il IAT 


110 


111 


96 


15 




10 Ml IAT 


99 


93 


106 



Legend: 



Si 



IS = Immobilized streptavidin 
IAT = Immobilized anhydro-trypsin 
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Table 115: Affinity Selection of MK-BPTI Phage 
on Immobilized Anhydro-Trypsin 

Percent of Total Phage 
Phage Type Addition Recovered in Elution Buffer 



MK-BPTI 5 Ml IS «l a 

2 /il IAT 5 

5 fil IAT 20 

10 pi IAT 50 

MK 5 Ml IS «l a 

2 /il IAT «1 

5 Ml IAT «1 

10 Ml IAT «1 



Legend: 



IS = Immobilized streptavidin 
IAT = Immobilized anhydro- trypsin 
a not detectable. 
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Table 130: Sampling of a Library encoded by (NNK) 6 
A. Numbers of hexapeptides in each class 
5 total = 64,000,000 stop-free sequences. 



10 



a can be 


one 


Of [WMFYCIKDENHQ] 






$ can be 


one 


Of [PTAVG] 








n can be 


one 


Of [SLR] 








aaaaacr 




2985984. 


Saocaaat 




7464960. 


ftaaaaa 




4478976. 


$$aaaa 




7776000. 


Sftacocaa 




9331200. 


finaotaa 




2799360. 


$$$aaa 




4320000. 






7776000. 


$nnaaa 




4665600. 






933120. 






1350000. 


$$$naa 




3240000. 






2916000. 


$ftftnaa 




1166400. 


nnnnaa 




174960. 






225000. 






675000. 


$$$ftna 




810000. 






486000. 


*nnnna 




145800. 


nnnnna 




17496. 


$ $ $ $ $ $ 




15625. 






56250. 






84375. 






67500. 


$$nnnn 




30375. 






7290. 


nnnnnn 




729. 



25 

$$nfta:c£, for example, stands for the set of peptides having 
two amino acids from the a class, two from $, and two from 
ft arranged in any order. There are, for example, 729 = 3 6 
sequences composed entirely of S, L, and R. 

30 
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B. 



Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Probability that any given stop-free DNA sequence 
will encode a hexapeptide from a stated class. 



10 



15 



20 



25 



30 



35 









P 




% of class 






3* 


364E- 


■03 


(!• 


13E-07) 


$ata! cu oeoe . 


• * 


1. 


682E- 


■02 


(2. 


25E-07) 


naaaaa . 


• • 


1. 


514E- 


■02 


(3. 


38E-07) 


$$aaaa. 




3. 


505E- 


-02 


(4. 


51E-07) 




• • 


6. 


308E-02 


(6. 


76E-07) 


nnaaact . 




2. 


839E- 


■02 


(1. 


01E-06) 




* * 


3. 


894E- 


■02 


(9- 


01E-07) 




• * 


1. 


051E-01 


(!• 


35E-06) 


$nnctaa . 




9. 


463E-02 


(2. 


03E-06) 


nanaaa . 




2. 


839E- 


02 


(3. 


04E-06) 




• • 


2, 


434E- 


02 


(!• 


80E-06) 






8, 


762E- 


02 


(2. 


70E-06) 




* * 


1. 


183E- 


01 


(4. 


06E-06) 


$ nnnaa. 




7. 


097E- 


02 


(6. 


08E-06) 






1. 


597E- 


02 


(9. 


13E-06) 






8. 


113E- 


03 


(3. 


61E-06) 




* * 


3. 


651E- 


02 


(5. 


41E-06) 






6. 


571E- 


02 


(8. 


11E-06) 






5. 


914E- 


02 


(1* 


22E-05) 


$nnnna * 


• « 


2. 


661E- 


02 


(1* 


83E-05) 


nnnnna . 


• • 


4. 


790E- 


03 


(2. 


74E-05) 


. 




1. 


127E- 


03 


(7. 


21E-06) 




* • 


6. 


084E- 


03 


(1- 


08E-05) 




• • 


1. 


369E- 


02 


(!• 


62E-05) 






1. 


643E- 


02 


(2. 


43E-05) 




* a 


1. 


109E- 


02 


(3. 


65E-05) 




• • 


3. 


992E- 


03 


-(5." 


48E-05) 


nnnnnn. 


• • 


5. 


988E- 


04 


(8. 


21E-05) 
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C. 



Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Number of different stop-free amino-acid sequences in 
each class expected for* various library sizes 



Library size = 



1.0000E+06 



total = 9.7446E+05 % sampled « 1.52 



Class 



Number 



Class 



Number 



aaaaaa* 
naaaact . 
§naaaoc. 

$$$$na, 
$§nnna. 
nnnnna . 



3362.6 
15114.6 
62871.1 
38765.7 
93672.7 
24119.9 
115915.5 
15261.1 
35537.2 
55684.4 

4190.6 

5767.0 
14581.7 

3073.9 



•1) 
.3) 

•7) 

•9) 
2.0) 
1.8) 
4.0) 
8.7) 
5.3) 
11.5) 
24.0) 
10.3) 
21.6) 
42.2) 



$aaaaa. . . 
$$aaao£. . . 
nnaacta . . . 
$$naaot. . . 
nnnaaac . ♦ . 
$$$naa. • ♦ 
sanriaa . . . 

§§ $nna. . . 
innnna . . . 
$$$$$$ ... 
$$$$nn. . . 
$$nnnn. . . 



16803.4 
34967.8 
28244.3 
104432.2 
27960.3 
86442.5 
68853.5 

7968.1 
63117.5 
24325.9 

1087.1 
12637.2 

9290.2 
408.4 



.2) 
•4) 
1.0) 
1.3) 
3.0) 
2.7) 
5.9) 
3.5) 
7.8) 
16.7) 
7.0) 
15.0) 
30.6) 
56.0) 



Library size = 



3.0000E+06 



total = 2.7885E+06 % sampled = 



4.36 



aaotocaa. 




10076. 4( 


•3) 


f aaaofa. . . 


50296. 9( 


•7) 


Oaaaaa. 




45190. 9( 


1.0) 


$$aaaa. . . 


104432. 2( 


1.3) 


§Oaaaa . 




187345. 5( 


2.0) 


nnaaact . . . 


83880. 9( 


3.0) 






115256. 6( 


2.7) 


- $$naaot. . . 


309107. 9( 


4.0) 


s>nnaaa . 


* . 


275413. 9( 


5.9) 


nnnaaa . . . 


81392. 5( 


8.7) 


$$$$cta. 




71074. 5( 


5.3) 


$naa. . . 


252470. 2( 


7.8) 






334106. 2( 


11.5) 


$nnooEO£ . . . 


194606. 9( 


16.7) 


nnnnaa. 




41905. 9( 


24.0) 


§# $$$a. . , 


23067. 8( 


10.3) 


§§ §$na. 


* . 


101097. 3( 


15.0) 


$$$nna. . . 


174981. 0( 


21.6) 






148643. 7( 


30.6) 


snnnna . . . 


61478. 9( 


42.2) 


nnnnna . 




9801. 0( 


56.0) 


... 


3039. 6( 


19.5) 


§§$§$n. 




15587.7 ( 


27.7) 




32516. 8( 


38.5) 


#§$nnn. 


* . 


34975. 6( 


51.8) 


$$nnnn. . . 


20215. 5( 


66.6) 


§nnnnn. 




5879. 9( 


80.7) 


nnnnnn . . . 


667. 0( 


91.5) 
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Table 130: Sampling of a Library encoded by (NNK) 

(continued) 

Library size » 1.0000E+07 

5 

total « 8.1204E+06 % sampled = 12.69 





aaaaaa. 




33455. 9( 


1.1) 


§aaaaot. 




166342. 4( 


2 


.2) 




naaaaa. 


* • 


148871. 1( 


3.3) 


$$ocaaa. 




342685. 7( 


4 


•4) 


10 


*naaaa. 




609987. 6( 


6.5) 


nnaaaa . 




269958.3 ( 


9 


.6) 




$#$aaa. 


• • 


372371. 8( 


8.6) 


$$naaa. 




983416.4 ( 


12 


.6) 




§nnaaa . 




856471. 6( 


18.4) 


nnnaaa . 




244761.5 ( 


26 


.2) 




$§$$aa* 




222702. 0( 


16.5) 


§$$naa. 




767692. 5( 


23 


•7) 




§§nnaa. 


• • 


972324. 6( 


33.3) 


snnnaa . 




531651. 3( 


45 


.6) 


15 


nnnnaa . 


* . 


104722. 3( 


59.9) 


#$$$$a . 




68111. 0( 


30 


•3) 




$$$§na. 




281976. 3( 


41.8) 




. « 


450120. 2( 


55 


.6) 




$$nnna. 


« • 


342072. 1( 


70.4) 


$nnnna . 




122302. 6( 


83 


•9) 




nnnnna . 


• * 


16364. 0( 


93.5) 


$$$$$$ . 




8028. 0( 


51 


•4) 




$$$$$n. 




37179. 9( 


66.1) 


§$#$nn. 




67719.5 ( 


80 


♦3) 


20 


*§$nnn. 




61580. 0( 


91.2) 


$$nnnn. 




29586. 1( 


97 


♦4) 




snnnnn. 


• « 


7259. 5( 


99.6) 


nnnnnn. 




728. 8( 


100 


.0) 




Library size = 3 


.0000E+07 












25 


total 




1.8633E+07 % sampled - 


29 


.11 








aaaaaa. 




99247. 4( 


3.3) 


$aaaaa. 




487990. 0( 


6 


.5) 




naaaaa. 




431933. 3( 


9.6) 


$$aaaa. 




983416. 5( 


12 


.6) 




$naaaa. 


• • 


1712943. 0( 


18.4) 






734284. 6( 


26 


.2) 


30 


$$$aaa. 


* . 


1023590. 0( 


23.7) 


*$naaa. 




2592866. 0( 


33 


*3) 




$nnaaa. 




2126605. 0( 


45.6) 


nnnaaa • 




558519. 0( 


59 


-9) 




$f $§0£0C. 




563952. 6( 


41.8) 


$#$naa. 




1800481. 0( 


55 


.6) 




$$nnaa. 




2052433. 0( 


70.4) 


$nnnaa. 




978420. 5( 


83 


•9) 




nnnnaa . 


a • 


163640.3 ( 


93.5) 


§#$$§a. 




148719 .-7 ( 


66 


•1) 


35 






541755. 7( 


80.3) 


$$$nna. 




738960. 1( 


91 


.2) 




$*nnna. 


• « 


473377. 0( 


97.4) 


§nnnna. 


. * 


145189. 7( 


99 


.6) 




nnnnna . 


• • 


17491. 3( 


100.0) 


• 


« 4 


13829. 1( 


88 


.5) 




$$$$§n. 




54058. 1( 


96.1) 


$$$$nn. 




83726. 0( 


99 


.2) 




$*$nnn. 


• • 


67454. 5( 


99.9) 


§$nnnn. 




30374.5(100 


.0) 


40 


snnnnn. 




7290. 0( 


100.0) 


nnnnnn. 




729.0(100 


.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



10 



15 



20 



25 



30 



35 



40 



Library size 



7.6000E+07 



total = 

aaaaaa. 
Haaaaa . 
$naaaa * 
$$$aaa* 
$nnaaa . 
$$$$aa. 
$#nnaa. 

nnnnaa . . 
$$$$na. . 
$$ftnna . . 
nnnnna . , 
$*$§§n. , 
§# $nnn. , 
snnnnn. , 



3.2125E+07 % sampled = 50.19 



245057 
1014733 
3749112 
2142478 
3666785 
1007002 
2782358 
174790 
663929 
485953 
17496 
56234 
67500 
7290 



Library size = 



,8( 8,2 
,0( 22,7 
>0( 40.2 
,0( 49.6 
,0( 78.6 
,0( 74.6 
>0( 95.4 
>0( 99.9 
,3( 98.4 
,2(100.0 
0(100.0 
9(100.0 
0(100.0 
0(100.0 



1.0000E+08 



$aaaaa. . 
$$aaaa. . 
nnaaaa . . 
$$naaa. • 
nnnaaa . . 

$nnnaa. . 

$$$$$a. . 

$$$nna. . 
$nnnna . . 
* * 

$$$$nn. . 
#§nnnn. . 
nnnnnn. . 



aaaaaa. . 
Haaaaa . . 
§naaaa. • 
$$$aaa. • 
$nnaaa . . 
$$$$aa. . 
$$nnaa. . 
nnnnaa . . 

$$$$na. . 
$snnna. . 
nnnnna . . 
$$$$$n. * 
$$$nnn. . 
$nnnnn. . 



318185 
1284677 
4585163 
2566085 
4051713 
1127473 
2865517 
174941 
671976 
485997 
17496 
56248 
67500 
7290 



K 
■0( 

0( 
,0( 

0( 
,0( 
,0( 



10.7) 
28.7) 
49.1) 
59.4) 
86.8) 
83.5) 
98.3) 
,0(100.0) 
,9( 99.6) 
,5(100.0) 
,0(100.0) 
,9(100.0) 
,0(100.0) 
,0(100.0) 



$$aaaa. . 
nnaaaa. • 
$$naaa. . 

nnnaaa . . 
$$$naa. • 
$nnftaa . • 
$$$$$a. . 

$$$nna. . 
snnnna . . 
* « 

#$$$nn. . 
$$nnnn. . 
nnnnnn. . 



1175010.0 
2255280.0 
1504128.0 
4993247.0 
840691.9 
2825063.0 
1154956.0 
210475.6 
808298.6 
145799.9 
15559.9 
84374.6 
30375.0 
729.0 



( 15.7) 
( 29.0) 
( 53.7) 
( 64.2) 
(.90.1) 
( 87.2) 
( 99.0) 
( 93.5) 
( 99.8) 
(100. 0)_ 
( 99.6) 
(100.0) 
(100.0) 
(100.0) 



total « 3.6537E+07 % sampled = 57.09 



1506161. 
2821285. 
1783932. 
5764391. 

888584. 
3023170. 
1163743. 
218886.- 
809757. 
145800. 
15613. 
84375. 
30375. 
729. 



0( 20.2) 
0( 36.3) 
0( .63.7) 
0( 74.1) 
3( 95.2) 
0( 93.3) 
0( 99.8) 
6( 97.3) 
3(100.0) 
0(100.0) 
5( 99.9) 
0(100.0) 
0(100.0) 
0(100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 

(continued) 



Library size = 3.0000E+08 

5 





total - 


5.2634E+07 


% sampled - 82 


.24 






aaaaaa* ♦ . 


856451. 3( 28 


.7 


1 $aaaaa... 


3668130. 0( 49. 


1) 




Haaaaa . . . 


2854291. 0( 63 


.7 


I $$aaaa... 


5764391. 0( 74. 


1) 


10 


$naaacc . . . 


8103426. 0( 86 


.8 


1 nnaaaa . . . 


2665753. 0( 95. 


2) 




SMaota. . . 


4030893. 0( 93 


.3 


t $$naaa... 


7641378. 0( 98. 


3) 




snnaaa . . . 


4654972. 0( 99 


.8 


I nnnaaa • . • 


933018.6(100. 


0) 




$$$#aa. . . 


1343954. 0( 99 


.6 


i $$$naa... 


3239029.0(100. 


0) 




$$nnotot. . . 


2915985.0(100 


.0 


i Sftnnaa... 


1166400.0(100. 


0) 


15 


nnnnaa . . . 


174960.0(100.0 


i . . 


224995.5(100. 


0) 




$$$$na. . . 


674999.9(100 


.0 


i §Mnna... 


810000.0(100. 


0) 




$ sonnet. . . 


486000.0(100 


.0 


i snnnna... 


145800.0(100. 


0) 




nnnnna . . . 


17496.0(100 




i $$$$$$ . m # 


15625.0(100. 


0) 




$$$$§n. * . 


56250.0(100 


• o; 


i $$##nn... 


84375.0(100. 


0) 


20 


$§§nnn. . . 


67500.0(100 


• o; 


i $$nnnn... 


30375.0(100. 


0) 




§nnnnn. . . 


7290.0(100 




nnnnnn... 


729.0(100. 


0) 



Library size = 1.0000E+09 

25 







total = 


6.1999E+07 % 


sampled ~ 96 


.87 








aaaaaa . . . 


2018278. 


0( 67. 


6 


i haaaaa... 


6680917. 0( 89. 


5 






Haaaaa. . . 


4326519. 


0( 96. 


6 


\ Sfotaaa. . . 


7690221. 0( 98. 


9 




30 


snaaaa. . . 


9320389. 


0( 99. 


9 


i OOaaaa . . . 


2799250.0(100. 


0 


y * 




$$$aaa . • • 


4319475. 


0(100. 


0 


i Sfflaaa. . . 


7775990.0(100. 


0 






$nnaaa. . . 


4665600. 


0(100. 


o; 


i nnnaaa,,. 


933120.0(100. 


0 






$$$$aa. . • 


1350000. 


0(100. 


0 4 


i $$$nao£... 


3240000.0(100. 


0 






$$nnaa. . . 


2916000. 


0(100. 


0 4 


i §nnnaa... 


1166400.0(100. 


0 




35 


nnnnaa . . . 


174960. 


0(100. 


0] 




225000.0(100. 


0 






$$#$na. • • 


675000. 


0(100. 


0, 


i $$$nna... 


810000.0(100. 


0 






$$nnna. . . 


486000. 


0(100. 


o; 


i #nnnna... 


145800.0(100. 


0 






nnnnnot . . . 


17496. 


0(100. 


o; 




15625.0(100. 


0 






$$$$$n 


56250. 


0(100. 


o; 


i $$$$nn. . . 


84375.0(100. 


0 




40 


$$$nnn. . . 


67500. 


0(100. 




i $§nnnn... 


30375.0(100. 


0 






snnnnn. . . 


7290. 


0(100. 


o; 


i nnnnnn... 


729.0(100. 


0 
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Table 130: Sampling of a Library encoded by (NNK) 

(continued) 



Library size = 3.0000E+09 

5 

total = 6.3890E+07 % sampled = 99.83 





aaaaaa . . . 


2884346. 


0( 96.6) 


f aaaaa. . . 


7456311. 0( 99. 


9) 




naaaceoc . * . 


4478800. 


0(100. 


0) 


$$aaaa. . . 


7775990.0(100. 


0) 


10 


$naaota. . . 


9331200. 


0(100. 


0) 


OOaocaa - . . 


2799360.0(100. 


0) 






4320000. 


0(100. 


0) 


$$nocaa. . . 


7776000.0(100. 


0) 




snnaaa . . . 


4665600. 


0(100. 


0) 


nnnaaa . . . 


933120.0(100.0) 






1350000. 


0(100. 


0) 




3240000.0(100. 


0) 






2916000. 


0(100. 


0) 


nnnnaa . . . 


1166400.0(100. 


0) 


15 


nnnnaa . . . 


174960. 


0(100. 


0) 




225000.0(100. 


0) 






675000. 


0(100. 


0) 


f §§nna. . . 


810000.0(100. 


0) 




$#nnna. • . 


486000. 


0(100. 


0) 


$nnnna . . . 


145800.0(100. 


0) 




nnnnna * . . 


17496. 


0(100. 


0) 


. . . 


15625.0(100.0) 




$$§§$n. . . 


56250. 


0(100. 


0) 


$$#§nn. . . 


84375.0(100. 


0) 


20 


$$§nnn. * . 


67500. 


0(100. 


0) 


$$nnnn. . . 


30375.0(100. 


0) 




snnnnn. . . 


7290. 


0(100. 


0) 


nannnn. . . 


729.0(10°* 


0) 
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Table 130, continued 

D. Formulae for tabulated quantities, 

5 Lsize is the number of independent trans formants* 
31**6 is 31 to sixth power; 6*3 means 6 times 3. 
A = Lsize/ (31**6) 
a can be one of [WMFYCIKDENHQ. ] 
* can be one of [PTAVG] 
10 n can be one of [SLR] 

F0 « (12)**6 Fl - (12)**5 F2 = (12)**4 

F3 - (12)**3 F4 « (12)**2 F5 = (12) 

F6 = 1 

15 aaaaaot = F0 * ( 1-exp ( -A)) 

faaaaa = 6 * 5 * Fl * (l-exp(-2*A) ) 
naaaaa = 6 * 3 * Fl * (l-exp(-3*A) ) 
$$aaaa = (15) * 5**2 * F2 * (l-exp(-4*A) ) 
$naaaa = (6*5)*5*3 *F2 * (1-exp (-6*A) ) 

20 nnaaaa - (15) * 3**2 * F2 * (1-exp (-9*A) ) 
$$§aaa = (20)* (5**3) * F3 * (l-e*P (~8*A) ) 
$§naaa = (60) * (5*5*3) *F3* (1-exp (-12*A) ) 
$nnaaa = (60) *(5*3*3) *F3* (1-exp (-18*A) ) 
nnnaaa = (20)*(3)**3*F3*(l-exp(-27*A)) 

25 $$$$aa = (15) *(5) **4*F4* (l-exp(-16*A) ) 
$$$naa = (60)*(5)**3*3*F4*(l-exp(-24*A) ) 
$§flOaa = (90) * (5*5*3*3) *F4* (l-exp(-36*A) ) 
§nnnaa « (60) *(5*3*3*3) *F4* (1-exp (-54*A) ) 
nnnnaa = (15)*(3)**4 * F4 *(l-exp(-81*A) ) 

30 = (6)*(5)**5 * F5 * (1-exp (-32*A) ) 

$$$$na = 30*5*5*5*5*3*F5*(l-exp(-48*A) ) 
$§§nna = 60*5*5*5*3*3*F5*(l-exp(-72*A) ) 
§§nnna « 60*5*5*3*3*3*F5*(l-exp(-108*A) ) 
§nnnna = 30*5*3*3*3*3*F5*(l-exp(-162*A) ) 

35 nnnnna = 6*3*3*3*3*3*F5*(l-exp(-243*A) ) 
$$$$$$ » 5**6 * ( 1-exp ( -64 *A) ) 
$$§f§n = 6*3*5**5* (l-exp(-96*A) ) 
$$$$nn = 15*3*3*5**4*(l-exp(-144*A) ) 

$$$nnn = 20*3**3*5**3* (i-exp(-2ie*A) ) 
40 #§nnnn = 15*3**4*5**2* (i-exp(-324*A)) 
§nnnnn = 6*3**5*5* (i-exp(-486*A) ) 
nnnnnn = 3**6*(i-exp(-729*A) ) 

total = aaaaacc + $aao:aa + naaaaot + ffaaoca + §na<xaa + 
flftaaaa + §§$aaa + fsnaacc + snnacaa 4- nflflaact + 
45 + $$*naa + ssnnaa + $nnnaa + nanflaa + 

§$$$$a + $$$$aoe + $$$nna + §§nnna + snnnna + 
nnnnna + + $§$#$n + §#$$nn + $§$nnn + 

$$nnnn + §nnnnn + nnnnnn 
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10 



15 



Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 

X can be F / S / Y / C / L,P / H / R # I / T,N / V # A # D,G 

T can be L 2 ,R 2 ,S,W,P,Q,M,T,K,V,A,E,G 

Library comprises 8.55*10 6 amino-acid sequences; 1.47 -lO 7 
DNA sequences. 

Total number of possible aa sequences^ 8,555,625 

X LVPTARGFYCHIND 
S S 

0 VPTAGWQMKES 

n lr 



The first, second, fifth, and sixth positions can 
20 hold x or S; the third and fourth position can hold 8 or 
n. I have lumped sequences by the number of xs, Ss, 6s , 
and fts. 

For example xxenss stands for: 
25 [xxenss, xsenxs, xsensx, ssenxx, sxenxs, sxensx, 

xxness, xsnexs, xsnesx, ssnexx, sxnexs, sxnesx] 

The following table shows the likelihood that any 
particular DNA sequence will fall into one of the defined 
30 classes. 



35 



40 



Library size = 



total . . 
xxGSxx. 

xxnnxx. 
xxenxs . 
xxeess . 
xxnnss . 
xsenss . 
sseess. 
ssnnss . 



1.0 

1.0000E+00 
3.1524E-01 
4.1684E-02 
1.3101E-01 
3.8600E-02 
5.1042E-03 
2.6736E-03 
1.3129E-04 
1.7361E-05 



Sampling = .00001% 



% sampled, 

xxenxx. . . 
xxeexs . . , 
xxnnxs . . . 
xxenss. . « 
xseess. . . 
xsnnss . . . 
ssenss. . . 



1.1688E-07 
2.2926E-01 
1.8013E-01 
2.3819E-02 
2.8073E-02 
3.6762E-03 
4.8611E-04 
9.5486E-05 



i 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 

5 The following sections show how many sequences of 

each class are expected for libraries of different sizes. 



10 



15 



20 



Library size = 



1.0000E+05 



total < 
Type 



9.9137E+04 fraction sampled = 1.1587E-02 



Number 



T ype 



Number 



% 



xxeexx. 
xxnnxx. 
xxenxs . 
xxeess . 
xxnnss . 
xsenss . 
sseess. 
ssnnss . 



31416. 9( 
4112.4 ( 
12924. 6( 
3808. 1( 
483. 7( 
253. 4( 
12.4 ( 
1.4( 



.7) 
2.7) 
2.7) 
2.7) 
10.3) 
10.3) 
10.3) 
35.2) 



xxenxx. 

XXGBxS . 

xxnnxs. 
xxenss . 
xseess . 
xsnnss . 
ssenss. 



22771. 4( 
17891. 8( 
2318. 5( 
2732. 5( 
357. 8( 
43. 7( 
8.6( 



1.3) 
1.3) 
5.3) 
5.3) 
5.3) 
19.5) 
19.5) 



25 



30 



Library size = 



1.0000E+06 



total . 



9.2064E+05 fraction sampled = 1.0761E-01 

xxeexx 304783.9 ( 6.6) XXGHxx 214394. 0( 12.7) 



xxnnxx 36508. 6 ( 

xxenxS 114741. 4 ( 



xxeess , 
xxnnss , 
xsenss , 
sseess* 
ssnnss, 



33807. 7( 
3114.6 ( 
1631. 5( 
80. 1( 
3.9( 



23.8) 
23.8) 
23.8) 
66.2) 
66.2) 
66.2) 
98.7) 



xxeexs , 
xxnnxs « 
xxenss , 
xseess , 
xsnnss . 
ssenss. 



168452. 5( 
18383. 8( 
21666. 6( 
2837. 3( 
198.4 ( 
39. 0( 



12.7) 
41.9) 
41.9) 
41.9) 
88.6) 
88.6) 



Library size ~ 



3.0000E+06 



35 



40 



total . 



2.3880E+06 
18 



xxeexx 855709.5 ( 

xxnnxx 85564.7 ( 

xxenxS 268917. 8 ( 



xxeess - 
xxnnss , 
xsenss . 
sseess. 
ssnnss , 



55 
55 
55 
96 
96 
96 
4.0(100 



fraction sampled - 2.7912E-01 

,4) xxenxx 565051. 6( 33.4) 

xxBBxS 443969. 1( 



79234. 7( 
4522. 6( 
2369. 0( 
116. 3( 



7) 
7) 
7) 
1) 
1) 
1) 
0) 



xxnnxs . 
xxenss , 
xseess . 
xsnnss . 
ssenss. 



35281 
41581 
5445 
223 
43 



3( 
5( 
2( 
7( 

>9( 



33.4) 
80.4) 
80.4) 
80.4) 
99.9) 
99.9) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



Library size = 



8.5556E+06 



total 4.9303E+06 fraction sampled = 5.7626E-01 

XX68XX 2046301. 0( 44,0) xxenxx 1160645, 0( 68.7) 

XXlinxx 138575.9 ( 90.2) XX86xS 911935. 6 ( 68.7) 



XXSnxS 435524.3 ( 90.2) xxnnxS, 

XX98SS 128324. 1( 90.2) xxenss, 

xxnnss 4703.6(100.0) xseess, 

xsenss 2463.8(ioo.o) xsnnss. 

sseess 121.0(100.0) ssenss, 

ssnnss 4.0(100.0) 



43480. 7( 99.0) 
51245. 1( 99.0) 
6710. 7( 99.0) 
224.0(100.0) 
44.0(100.0) 



Library size = 



1.0000E+07 



total 5.3667E+06 

XX99XX 2289093.0 ( 49, 

xxnnxx 143467. 0( 93, 

xxenxS 450896.3 ( 93, 

XX99SS 132853.4 ( 93, 

4703.9(100, 
2464.0(100, 
121.0(100, 
4.0(100, 



fraction sampled = 6.2727E-01 

2) xxenxx 1254877. 0( 74.2) 

XX98XS 985974.9 ( 74.2) 



xxnnss , 
xsenss , 
sseess, 
ssnnss, 



4) 

4 ) xxnnxs , 

4) xxenss, 

o) xseess. 

o) xsnnss* 

o) ssenss. 

o) . 



43710. 7( 99.6) 
51516. 1( 99.6) 
6746. 2( 99.6) 
224.0(100.0) 
44.0(100.0) 



Library size = 



3.0000E+07 



total 7.8961E+06 

XX98XX 4040589. 0( 86.9) 

xxnnxx 153619 . 1 (100 . 0) 

XxenxS 482802 . 9 (100 . 0) 

XxeeSS 142254 . 4 ( 100 . 0) 

xxnnSS 4704 . 0 ( 100 . 0) 

xSenSS 2464 . 0(100.0) 

sseess 121.0(100.0) 

ssnnss 4.0(100.0) 



fraction sampled = 9.2291E-01 
xxenxx..... 1661409. 0( 98.3) 

xxeexS 1305393. 0( 98.3) 

43904.0(100.0) 
51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 



xxnnxs . 
xxenss , 
xseess , 
xsnnss , 
ssenss, 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



Library size = 



5.0000E+07 



total 8.3956E+06 fraction sampled = 9.8130E-01 

xxeexx 4491779.0 ( 96.6) xxSOxx 1688387.0 ( 99.9) 

xxnnxx 153663*8 (100.0) XX06XS 1326590. 0( 99.9) 



xxenxS 482943.4(100.0) xxDOxS, 

xxeess 142295.8(100.0) xxenss. 

xxnnss 4704.0(100.0) xseess. 

xsenss 2464.o(ioo.o) xsnnss, 

sseess 121.0(100.0) ssenss. 

ssnnss 4.0(100.0) 



43904.0(100.0) 
51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 



Library size = 



1.0000E+08 



total 8.5503E+06 

xxeexx 4643063.0 ( 99, 

xxnnxx 153664 . 0 (100 , 

xxenxS 482944 . 0 ( 100 , 

XxeeSS 142296.0(100, 

xxnnss 4704.0(100, 

xSenSS 2464.0(100. 

sseess 121.0(100, 

ssnnss 4.0(100, 



fraction sampled = 9.9938E-01 



9) 
0) 
0) 
0) 
0) 



xxenxx 1690302.0(100.0) 

xxeexS 1328094.0(100.0) 



xxnnxs . 
xxenss , 
xseess , 
o) xsnnss, 
o) ssenss. 
o) 



43904.0(100.0) 
51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 
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Table 132: Relative efficiencies of 
various simple variegation codons 



yqCodon 



Number of codons 

5 6 



#DNA/#AA #DNA/#AA #DNA/#AA 
[#DNA] [#DNA] [#DNA] 



f#AA) 



f# AA ) 



f#AA) 



10 NNK 

assuming 
stops vanish 



8.95 13,86 21.49 

[2.86-10 7 ] [8.87-10 8 ] [2.75«10 10 ] 
(3.2*10 6 ) (6.4*10 7 ) (1.28 *10 9 ) 



NNT 



15 



1.38 1.47 1.57 

[1.05*106] [1.68-10 7 ] [2.68»10 8 ] 
(7.59*10 5 ) (1.14-10 7 ) (1.71-10 8 ) 



NNG 

assuming 
20 stops vanish 



2.04 2.36 2.72 

[7.59*10 5 ] [1.14'10 6 ] [1.71-10 8 ] 
(3.7»10 5 ) (4.83-10 6 ) (6.27-10 7 ) 
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Table 140. 


Affect of 


anti BPTI 


TaG on phaae titer. 


Phage 
Strain 


Input 


+Anti-BPTI 


4Anti-BPTI Eluted Phage 
+Protein A (a.) 


M13MP18 


100 (b) 


98 


92 


7*10" 4 


BPTI. 3 


100 


26 


21 


6 


M13MB48 (c) 


100 


90 


36 


0.8 


M13MB48 (d) 


100 


60 


40 


2.6 



(a) Protein A-agarose beads. 

(b) Percentage of input phage measured as plaque 
forming units 

(c) Batch number 3 

(d) Batch number 4 



Table 141. Affect of anti-BPTI or protein A on phage 
titer. 

No -hAnti- +Anti- 
Strain Input Addition . BPTI +Protein A BPTI 
{aj +Protein A 

M13MP18 100(b) 107 105 72 65 

M13MB48fb)100 92 7.10^ 3. 58 <10^ 

(a) Protein A-agarose beads 

(b) Percentage of input phage measured as plaque 
forming units 

(c) Batch number 5 
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Table 142 Affect of anti-BPTI and non-immune serum on 
phage titer 



Strain 


Input 


+Anti- 
BPTI 


+NRS 
(a) 


+Anti- 
BPTI 
+Protein A 
fb) 


+NRS 

+Protein 
A 


M13MP18 


100(c) 


65 


104 


71 


88 


M13MB48 (d) 


100 


30 


125 


13 


121 


M13MB48 (&) 


100 


2 


105 


0,7 


110 


(a) 
(b) 


Purified IgG from 
Protein A-agarose 


normal 
beads . 


rabbit serum. 





(c) Percentage of input phage measured as plaque 
forming units 

(d) Batch number 4 

(e) Batch number 5 
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Table , 143. Loss in titer of display phage with 
anhydrotrypsin . 



Strain 


Anhydrotrypsin 
Beads 


Streptavidin 
Beads 




Start 


Post 
Incubation 


Post 

Start Incubation 


M13MP18 


100 (a) 


121 


ND ND 


M13MB48 


100 


58 


100 98 


5AA Pool 


100 


44 


100 93 



(a) Plaque forming units expressed as a percentage of 
input. 

Table 144, Binding of Display Phage to Anhydrotrypsin. 
Experiment 1. 



Strain 

M13MP18 

BPTI-IIIMK 

M13MB48 



E luted Phage (a) 

0.2 (a) 
7.9 
11.2 



Relative to 
M13MP18 
1.0 
39.5 
56.0 



Experiment 2. 



Strain 

M13mpl8 

BPTI-IIIMK 

M13MB56 



E luted Phage (a) 

0.3 
12.0 
17.0 



Relative to 
M13mpl8 
1.0 
40.0 
56.7 



(a) Plaque forming units acid eluted from beads, expressed 
as a percentage of the input. 
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Table 145. Binding of Display Phage to Anhydrotrypsin or 
Trypsin. 



Strain 


AnhvdrotrvDsin Beads 


TrvDsin Beads 






Eluted 
Phage 
fa) 


Relative 

Bindincr fb) 


Eluted 
Phage 


Relative 
Bindina 


M13MP18 




0.1 


1 


2.3xl0~ 4 


1.0 


BPTI-IIIMK 


| 9.1 


91 | 


1.17 


5x103 


M13 . 3X7 | 




25.0 


250 | 


1.4 


6xl0 3 


M13.3X11 | 




9.2 


92 1 


0.27 


1.2xl0 3 



(a) Plaque forming units eluted from beads, expressed as a 
percentage of the input. 

(b) Relative to the non-display phage, M13MP18. 



Table 146. Binding of Display Phage to Trypsin or Human 
Neutrophil Elastase. 



Strain 


Trvnsin Beads 


HNE Beads 




Eluted Phage 


Relative 


Eluted 


Relative 




(a) 


Bindina (h) 


Phaae 


Bindina 


M13MP18 


5xl0" 4 


1 


3xl0" 4 


1.0 


BPTI-IIIMK-| -1.0 


2000 


5xl0~ 3 


16.7 


M13MB48 i 


0.13 


260 | 


9xl0" 3 


30.0 


M13.3X7 | 


1.15 


2300 | 


1X10" 3 


3.3 


M13.3X11| 


0.8 


1600 | 


2X10" 3 


6.7 


BPTI3.CL| 


lxlO" 3 


2 | 


4.1 


1.4X10 4 



(c) 

(a) Plague forming units acid eluted from the beads, 
expressed as a percentage of input. 

(b) Relative to the non-display phage, M13MP18. 
(C) BPTI-IIIMK (K15L MGNG) 
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Table 155 

Distance in A between alpha carbons in octapeptides : 



Extended Strand: angle of 0,^1-^2-0^3 = 138* 



1 


2 


3 


4 


D 


o 


*7 
/ 


Q 

o 


1 
















2 3.8 
















3 7.1 


3.8 














4 10.7 


7.1 


3.8 












5 14.2 


10.7 


7.1 


3.8 










6 17.7 


14.1 


10.7 


7.1 


*3 O 








7 21.2 


17.7 


14.1 


10.6 


7.0 


3.8 






8 24.6 


20.9 


17.5 


13.9 


10.6 


7.0 


3.8 




Reverse turn between residues 


4 and 


O . 






l 


2 


3 


4 


cr 
O 


6 


7 


8 


1 
















2 3.8 
















3 7.1 


3.8 














4 10.6 


7.0 


3.8 












5 11.6 


8.0 


6.1 


3.8 










6 9.0 


5.8 


5.5 


5.6 


3.8 








7 6.2 


4.1 


6.3 


8.0 


7.0 


3.8 






8 5.8 


6.0 


9.1 


11.6 


10.7 


7.2 


3.8 




Alpha helix: angle of 


C^I^Cq^ 


»C a 3 = 


93° 






1 


2 


3 


4 


5 


6 


7 


8 


1 
















2 3/8 
















3 5.5 


3.8 














4 5.1 


5.4 


3.8 












5 6.6 


5.3 


5.5 


3.8 










6 9.3 


7.0 


5.6 


5.5 


3.8 








7 10.4 


9.3 


6.9 


5.4 


5.5 


3.8 






8 11.3 


10.7 


9.5 


6.8 


5.6 


5.6 


3.8 





Table 156 
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Distances between alpha carbons in closed mini-proteins of 
the form disulfide cyclo(CXXXXC) 



Minimum distance 

1 2 3 4 5 6 

1 



2 


3.8 










3 


5.9 


3.8 








4 


5.6 


6.0 


3.8 






5 


4.7 


5.9 


6.0 


3.8 




6 


4.8 


5.3 


5.1 


5.2 


3.8 



Average distance 

1 2 3 4 5 6 

1 



2 


3.8 








3 


6.3 


3.8 






4 


7.5 


6.4 


3.8 




5 


7.1 


7.5 


6.3 


3.8 


6 


5.6 


7.5 


7.7 


6.4 3.8 



Maximum distance 

1 2 3 4 5 6 



1 












2 


3.8 










3 


6.7 


3.8 








4 


9.0 


6.9 


3.8 






5 


8.7 


8.8 


6.8 


3.8 




6 


6.6 


9.2 


9.1 


6.8 


3.8 



J 
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Table 160: pH Profile of BPTI-III MK phage and Shad 1 
phage binding to Cat G beads. 

BPTI-IIIMK 



5 pH Total pfu in Fraction Percentage of Input 

7 3.7x10 s 3.7xl0~ 2 

6 3 . 1x10 s 3 . 1X10" 2 

5 1.4x10 s 1.4xl0~ 2 
4.5 3 . 1X10 4 3 . IxlO" 3 

10 4 7.1X10 3 7.1X10" 4 

3.5 2.6X10 3 2.6xl0~4 

3 2.5x103 2.5X10" 4 
2.5 8.8X10 2 8.8xlO~ S 

2 7.6X10 2 7.6X10" 5 
15 (total input = 1x10 9 phage) 

Shad 1 

7 2.5x10 s - l.lxlO" 2 

6 6.3X10 4 2.7xl0~ 3 
20 5 7,4xl0 4 3.1X10" 3 

4.5 7-lXlO 4 3.0xl0~ 3 

4 4.1X10 4 1.7xl0~ 3 
3.5 3.3X10 4 1.4X10" 3 

3 2.5X10 3 l.lxlO" 4 
25 2.5 1.4X10 4 5.7xl0~4 

2 5.2X10 3 2.2X10" 4 



(total input = 2. 3 5x1 0 s phage) • 
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TABLE 201 

Elution of Bound Fusion Phage from Immobilized 
— - Active Trypsin 



Type of 
Phage 


Buffer 


Total Plaque- 
Forming Units 
Recovered in 
Elution Buffer 


Percent of 
Input Phage 
Recovered 


Ratio 


BPTI-III MK 


CBS 


8.80*10 7 


4.7-10" 1 












1675 


MK 


CBS 


i.35ao 6 


2.8*10~ 4 




BPTI-III MK 


TBS 


1.32*10 8 


7.2'IQ" 1 












2103 


MK 


TBS 


1.48-10 6 - 


-3.4-10" 4 





The total input for BPTI-III MK phage was 1.85»10 10 
plaque-forming units while the input for MK phage was 
4.65-10 11 plaque-forming units. 
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TABLE 202 

Elution of BPTI-III MK and BPTI (K15L) -III MA Phage from 
Immobilized Trypsin and HNE 



Type of 
Phage 


Immobil- 
ized 
Protease 


Total Plaque- 

"Fotto"! Tier TTrii+'C! 

in Elution 
Fraction 


Percentage of 

J. lip UL L. iriidy e 

Recovered 


BPTI-III 
MK 


Trypsin 


2.1-10 7 


4.1-10" 1 


BPTI-III 
MK 


HNE 


2.6-10 5 


5»10" 3 


BPTI (K15L) 
III MA 


- Trypsin 


5.2-10 4 


5«10" 3 


BPTI(K15L) 
III MA 


- HNE 


1.0*10 6 


1.0-10" 1 



The total input of BPTI-III MK phage was 5,1*10 9 pfu and 
the input of BPTI (K15L) -III MA phage was 9.6»10 8 pfu. 
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TABLE 203 

Effect of pH on the Disociation of 
Bound BPTI-III MK and 
BPTI (K15L) -III MA Phage from Immobilized HNE 





BPTI-III 


MK 


BPTI (K15L) - 


•III MA 


PH 


Total Plaque- 
Forming Units 
in Fraction 


% 

of Input 
Phage 


Total Plague- 
Forming Units 
in Fraction 


% 

of Input 
Phage 


7.0 


5.0-10 4 


2»10~ 3 


1.7»10 5 


3.2*10" 2 


6.0 


3.8*10 4 


2«10" 3 


4.5»10 5 


8.6-10" 2 


5.0 


3.5*10 4 


1«10~ 3 


2.1*10 6 . . 


4.0*10~ 1 


4.0 


3.0-10 4 


1*10" 3 


4.3-10 6 


s^io -1 


3.0 


1.4-10 4 


1-10" 3 


1.1-10 6 


2.1-10* 1 


2.2 


2.9-10 4 


1»10~ 3 


5.9 *10 4 


1.1*10~ 2 



Percentage of Percentage of 

Input Phage « 8.0*10~ 3 Input Phage = 1.56 
Recovered Recovered 



The total input of BPTI-III MK phage was 
0.030 ml X (8.6»10 10 pfu/ml) = 2.6*10 9 . 

The total input of BPTI (K15L) -III MA phage was 
0.030 ml x (1.7*10 10 pfu/ml) « 5.2*10 S . - - 

Given that the infectivity of BPTI (K15L) -III MA phage is 5 
fold lower than that of BPTI-III MK phage, the phage 
inputs utilized above ensure that an equivalent number of 
phage particles are added to the immobilized HNE. 
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TABLE 204 

Effect of Mutation of Residues 39 to 42 of BPTI 
on the ability of BPTI (K15L) -III MA to Bind to 
Immobilized HNE 



BPTI (K15L) -III MA BPTI (K15L,MGNG) -III MA 



PH 


Total Plaque- 
Forming Units 


% 

Input 


Total Plaque- 
Forming Units 


% 

Input 


7*0 


3.0*10 5 


8.2*10~ 2 


4.5*10 5 


1.63-10" 1 


6,0 


3.6«10 5 


1.00-10" 1 


6.3-10 5 - 


2.27*10~ 1 


5,5 


5.3-10 5 


1.46*10~1 


7.3»10 5 


2.64«10 _1 


5.0 


5.6*10 5 


1.52 • 10" 1 


8.7*10 5 


3.16-10" 1 


4.75 


9.9-10 5 


2.76'IQ" 1 


1.3«10 6 


4.60*10~ 1 


4.5 


3.1»10 5 


8.5*10~ 2 


3.6*10 5 


l.SO'lO" 1 


4.25 


5.2»10 5 


1.42*10~ x 


5.0*10 5 


1.80-10" 1 


4.0 


5.1*10 4 


1.4-10" 2 


1.3'10 5 


4.8^10~ 2 


3.5 


1.3-10 4 


4-10" 3 


3.8*10 4 


1.4*10~ 2 




Total 

Percentage 
Recovered 


= 1.00 


Total 

Percentage = 
Recovered 


» 1,80 



The total input of BPTI (K15L) -III MA phage was 
0.030 ml x (1.2*10 10 pfu/ml) = 3.6»10 S pfu. 

The total input of BPTI (K15L,MGNG) -III MA phage was 
0.030 ml X (9.2-10 9 pfu/ml) = 2.8*10 8 pfu. 
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TABLE 205 

Fractionation of a Miirture of 

BPTI-III MK and 
BPTI(K15L,MGNG)-III MA Phage 
on Immobilized HNE 





BPTI-III 


MK 


BPTI(K15L,MGNG)-III MA 


pH ' 


Total 
Kanamycin 
Transducing 
Units 


% 

of Input 


Total 

Ampiciliin 

Transducing 

Units 


% 

of Input 


7.0 


4.01-10 3 


4.5»10~ 3 


1.39*10 5 


3.13»10" 1 - 


6.0 


7.06*10 2 


8«10~ 4 


7.18»10 4 


1.62*10" 1 


5.0 


1.81«10 3 


2. 0-10" 3 


1.35-10 5 


3.04-10" 1 


4.0 


1.49-10 3 


1.7-10" 3 


7.43«10 5 


1.673 



The total input of BPTI-III MK phage was 

0.015 ml x (5.94-10 9 kanamycin transducing units/ml) « 

8.91»10 7 kanamycin transducing units. 

The total input of BPTI (K15L,MGNG) -III MA phage was 
0.015 ml x (2.96*10 9 ampiciliin transducing units/ml) = 
4.44 »10 7 ampiciliin transducing units. 
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TABLE 206 

Characterization of the Affinity of 
5 BPTI (K15V,R17L) -III MA Phage for Immobilized HNE 



BPTI (K15V,R17L) -III MA BPTI (K15L,MGNG) -III MA 



10 pH Total Plaque- Percentage Total Plaque- Percentage 
Forming Units of Input Forming Units of Input 
Recovered Phage Recovered Phage 



15 



25 



7.0 


3.19-10 6 


8.1«10~ 2 


9.42«10 4 


4.6-10" 2 


6.0 


5.42*10 6 


i.ss'io" 1 


1.61-10 5 


7.9-10" 2 


5.0 


9.45*10 6 


2.41-10" 1 


2.85-10 5 


l^'lO"" 1 


4.5 


1.39»10 7 


S.SS^IO" 1 


4.32-10 5 


2.11-10" 1 


4.0 


2.02-10 7 


5.15-10" 1 


1.42-10 5 


6.9-10" 2 


3.75 


9.20-10 6 


2.35-10" 1 






3.5 


4.16-10 6 


l.oe^io" 1 


5.29*10 4 


2.6*10~ 2 


3.0 


2.65«10 6 


6.8 «10" 2 








Total Input 


= 1.73 


Total Input 


= 0.57 



Recovered Recovered 



Total input of BPTI (K15V,R17L) -III MA phage was 
35 0.040 ml x (9.80*10 10 pfu/ml) = 3.92«10 9 pfu. 

Total input of BPTI (K15L,MGNG) -III MA phage was 
0.040 ml x (5.13-10 9 pfu/ml) = 2. 05* 10 s pfu. 
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TABLE 207 

Sequence of the EpiNEot Clone Selected 
From the Mini-Library 



1 


1 


1 


1 


1 


1 


1 


2 


2 


3 


4 


5 


6 


7 


8 


9 


0 


1 


P 


C 


V 


A 


M 


F 


Q 


R 


Y 



CCT . TGC • GTG * GCT . ATG * TTC • CAA . CGC . TAT 
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TABLE 208 . 

SEQUENCES OF THE EpiNE CLONES IN THE PI REGION 



CLONE SEQUENCE 
IDENTIFIERS 

111111122 
345678901 
3, 9, 16, P C V G F F S R Y EpiNE3 

17, 18, 19 CCT . TGC . GTC . GGT • TTC . TTC . TCA . CGC . TAT 

111111122 
345678901 
6 PCVGFFQRY EpiNE 6 

CCT . TGC . GTC . GGT . TTC . TTC • CAA . CGC . TAT 

111111122 
345678901 
7, 13 ,14 PCVAMFPRY EpiNE7 

15, 20 CCT. TGC. GTC, GCT.ATG. TTC .CCA. CGC. TAT 

111111122 
345678901 
4 PCVAIFPRY EpiNE 4 

CCT . TGC . GTC . GCT . ATC . TTC . CCA . CGC . TAT 

111111122 
345678901 
8 PCVAIFKRSEp iNE8 

CCT . TGC . GTC . GCT . ATC . TTC . AAA . CGC . TCT 

111111122 
345678901 
1, 10 PCIAFFPRY EpiNE 1 

11 , 12 CCT . TGC . ATC . GCT . TTC . TTC . CCA . CGC . TAT 

111111122 
345678901 
5 PCIAFFQRY EpiNE 5 

CCT . TGC . ATC . GCT . TTC . TTC . CAA . CGC . TAT 

111111122 
345678901 
2 PCIALFKRY EpiNE2 

CCT . TGC . ATC . GCT . TTG . TTC . AAA . CGC « TAT 
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Table 209: DNA sequences and predicted amino acid 
sequences around the PI region of BPTI analogues selected 
for binding to Cathepsin G. 



Clone 


PI 












15 


16 


17 


18 


19 


BPTI 


AAA 


. GCG . 


CGC 


. ATC , 


. ATC 




LYS 


ALA 


ARG 


ILE 


ILE 


EpiC 1 


ATG 


. GGT . 


TTC 


• TCC , 


, AAA 


(a) 


MET 


GLY 


PHE 


SER 


LYS 


EpiC 7 


ATG 


. GCT # 


TTG 


♦ TTC , 


, AAA 




MET 


ALA 


LEU 


PHE 


LYS 


EpiC 8 


TTC 


. GCT . 


ATC 


. ACC , 


. CCA 




PHE 


ALA 


ILE 


THR 


PRO 


EpiC 10 


ATG 


. GCT . 


TTG 


• TTC . 


. CAA 




MET 


ALA 


LEU 


PHE 


GLN 


EpiC 20 


ATG 


. GCT . 


ATC 


. TCC . 


CCA 




MET 


ALA 


ILE 


SER 


PRO 



(a) Clones 11 and 31 also had the identical sequence. 

(b) Clone 8 also contained the mutation Tyr 10 to ASN. 
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Table 210 
Derivatives of EpiNE7 Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

5 

EpiNE7 

♦♦♦♦♦ **** 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 
1 2 3 4 5 

10 1234567890123456789012345678901234567890123456789012345678 



15 



30 



45 



EpiNE7.6 4-JnW4" ♦ ♦ 

RPDFCI^PPYTGPCvAiafpRYFYNAKAGLCQTFlYgGCkgkGNNFKSAEDCMRTCGGA 
EpiNE7.8, EpiNE7.9, and EpiNE7.31 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFeYgGCwakGNNFKSAEDCMRTCGGA 



EpiNE7.11 

20 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFgYaGCrakGNNFKSAEDCMRTCGGA 
EpiNE7 . 7 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGChaeGNNFKS AEDCMRTCGGA 

25 EpiNE7.4 and EpiNE7.14 

RPDFCLEPPYTGPCvAinfpRYFYNAKAGLCQTFlYgGCwaqGNNFKSAEDCMRTCGGA 



EpiNE7 . 5 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFrYgGClaeGNNFKS AEDCMRTCGGA 
EpiNE7.10 and EpiNE7.20 

RPDFCLEPPYTGPCvAinfpRYFYNAKAGLCQTFdYgGChadGNNFKSAEDCMRTCGGA 



EpiNE7 . 1 

3 5 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCl ahGNNFKS AEDCMRTCGGA 
EpiNE7.16 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGCwanGNNFKS AEDCMRTCGGA 
40 EpiNE7.19 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFnYgGCegkGNNFKS AEDCMRTCGGA 



EpiNE7.12 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFqYgGCegyGNNFKS AEDCMRTCGGA 
EpiNE7.17 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFqYgGClgeGNNFKS AEDCMRTCGGA 



EpiNE7.21 

50 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFhYgGCwgqGNNFKSAEDCMRTCGGA 
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Table 210: Derivatives of EpiNE7 Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

(continued) 



EpiNE7 ♦♦♦♦♦ **** 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 

1 2 3 4 5 

1234567890123456789012345678901234567890123456789012345678 

EpiNE7.22 Hill ♦ ♦ 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFhYgGCwgeGNNFKSAEDCMRTCGGA 
15 EpiNE7.23 

RPDFCI^PPYTGPCvAiafpRYFYNAKAGLCQTFkYgGCwgkGNNFKSAEDCMRTCGGA 



10 



20 



35 



50 



EpiNE7.24 

RPDFCLEPPYTGPCvAinfpRYFYNAKAGLCQTFkYgGChgnGNNFKSAEDCMRTCGGA 
EpiNE7.25 

RPDFCLEPPYTGPCvAiafpRYFYNAKAGLCQTFpYgGCwakGNNFKlAEDCMRTCGGA 



EpiNE7.26 

25 RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFkYgGCwghGNNFKSAEDCMRTCGGA 
EpiNE7.27 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFnYgGCwgkGNNFKSAEDCMRTCGGA 
3 0 EpiNE7.28 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFtYgGClghGNNFKSAEDCMRTCGGA 



EpiNE7.29 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFtYgGClgyGNNFKSAEDCMRTCGGA 
EpiNE7.30, EpiNE7.34, and EpiNE7.35 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFkYgGCwaeGNNFKSAEDCMRTCGGA 



EpiNE7.32 

40 RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFgYgGCwgeGNNFKSAEDCMRTCGGA 
EpiNE7.33 

RPDFCJLEPPYTGPCvAmfpRYFYNAKAGLCQTFeYgGCwanGNNFKSAEDCMRTCGGA 
45 EpiNE7.36 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFvYgGChgdGNNFKSAEDCMRTCGGA 



EpiNE7.37 

RPDFCI^PPYTGPCvAmfpRYFYNAKAGLCQTFmYgGCqgkGNNFKSAEDCMRTCGGA 
EpiNE7.38 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFyYgGCwakGNNFKSAEDCMRTCGGA 
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Table 210 (continued) 
Derivatives of EpiNE7 Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

5 

EpiNE7 

♦♦♦♦♦ **** 

I^DFCLEPPYTGPCvAmfpRYFYNAKAGLC^ 

1 2 3 4 5 

101234567890123456789012345678901234567890123456789012345678 



EpiNE7.39 ♦ ♦ 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFmYgGCwgdGNNFKSAEDCMRTCGGA 
EpiNE7.40 

RPDFCLEPPYTGPCvAmfpRYFYNAKAGLCQTFtYgGChgnGNNFKSAEDCMRTCGGA 
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Table 210: Derivatives of EpiNE7 Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

(continued) 

5 

Notes : 

a) ♦ indicates variegated residue. * indicates imposed 
change. 4- indicates carry over from EpiNE7. 

10 

b) The sequence M 39 -GNG in EpiNE7 (indicated by *) was 
imposed to increase similarity to ITI-D1. 

b) Lower case letters in EpiNE7.6 to 7.38 indicate 
15 changes from BPTI that were selected in the first 

round (residues 15-19) or positions where the PBD was 
variegated in the second round (residues 34, 36, 39, 
40, and 41) . 

20 c) All EpiNE7 derivatives have G 42 . 
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TABLE 211 

Effects of antisera on phage infectifity 



Phage 
(dilution 
of stock) 


Incubation 

Conditions 


pfu/ml 


Relative 
Titer 


10 

MA-ITI 

(io-i) 


PBS 
NRS 
anti-ITI 


1.2-10 11 

6.8»10 10 
1.1-10 10 


1.00 
0.57 
0.09 


15MA-ITI 
(10" 3 ) 


PBS 
NRS 
anti-ITI 


7.7*10 8 
6.7«10 8 - 
8.0-10 6 


1.00 
-0.87 
0.01 


MA 

20 (10" 1 ) 


PBS 
NRS 
anti-ITI 


1.3- 10 12 

1.4- 10 12 
1.6-10 12 


1.00 
1.10 
1.20 


MA 

(io- 3 ) 

25 


PBS 
NRS 
anti-ITI 


1.3-10 10 
1.2*10 10 
1.5*10 10 


1.00 
0.92 
1.20 
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TABLE 212 

Fractionation of EpiNE-7 and MA-ITI phage on HNE beads 

5 





Sample 

10 


Er>iNE-7 


MA- 


ITI 


Total pfu 
in sample 


Fraction 
of input 


Total pfu 
in sample 


Fraction 
of input 




INPUT 


3.3«10 y 


1.00 


3.4«10 iJ - 


1.00 




ISFinal 








5.3*10" 6 




TBS-TWEEN 


3.8-10 5 


1.2 *10~ 4 


1,8*10 6 




Wash 






- 




%i 


20pH 7.0 


6.2*10 5 


1.8-10" 4 


1.6-10 6 


4.7«10~ 6 




pH 6.0 


1.4»10 6 


4.1^10~ 4 


1.0-10 6 


2.9*10" 6 




pH 5.5 


9.4 «10 5 


2.8*10~ 4 


1.6-10 6 


4.7*10" 6 


=Jj 


25 












pH 5.0 


9.5-10 5 


2.9*10~ 4 


3.1-10 5 


9.1«10* 7 


yy 


pH 4.5 


1.2-10 6 


3.5*10~ 4 


1.2-10 5 


3.5*10~ 7 




30pH 4.0 


1.6*10 6 


4.8-10" 4 


7.2-10 4 


2.1*10" 7 


ins s 


pH 3.5 


9.5-10 5 


2.9-10"" 4 


4,9*10 4 


1.4 -10~ 7 




pH 3.0 


6.6*10 5 


2.0»10~ 4 


2.9»10 4 


8.5*1CT 8 




35 












pH 2.5 


1.6*10 5 


4.8«10~ 5 


1.4-10 4 


4.1-10* 8 




pH 2.0 


3.0«10 5 


9.1-10" 5 


1.7-10 4 


5.0*10~ 8 



40 

SUM* 6.4-10 6 3*10~ 3 5.7«10 6 2*10~ 5 



* SUM is the total pfu (or fraction of input) obtained from 
45all pH elution fractions 
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TABLE 213 

Fractionation of EpiC-10 and MA-ITI phage on Cat-G beads 

5 







EpiC- 


10 


MA- 


ITI 




Sample 

10 


Total pfu 
in sample 


Fraction 
of input 


Total pfu 
in sample 


Fraction 
of input 




INPUT 


5.0-10 11 


1.00 


4.6-lQ 1 ^ 






ISFinal 
TBS-TWEEN 
Wash 


1.8*10 7 


3.6-10" 5 


7.1-10 6 


1.5«10" 5 




20pH 7.0 


1.5'10 7 


3.0«10~ 5 


6.1*10 6 


1.3-10" 5 




pH 6.0 


2.3 -lO 7 


4.6*10~ 5 


2.3*10 6 


5.0«10~ 6 




pH 5.5 

25 

pH 5.0 


2.5*10 7 
2.1*10 7 


S.0'10" 5 
4.2-10" 5 


1.2*10 6 
1.1«10 6 


2.6*10 6 
2.4«10~ 6 




pH 4.5 


1.1-10 7 


2.2* 10~ 5 


6.7 »10 5 


1.5-10" 6 




30pH 4.0 


1.9«10 6 


3.8-10" 6 


4.4*10 5 


9.6*10" 7 




pH 3.5 


1.1-10 6 


2.2»10~ 6 


4.4-10 5 


9.6-10" 7 




pH 3.0 

35 

pH 2.5 


4.8-10 5 
2.0»10 5 


9.6-10" 7 
4.0»10~ 7 


3.6*10 5 
2.7*10 5 


7.8*10" 7 
5.9*10~ 7 




pH 2.0 


2.*4*10 5 


4.8»10~ 7 


3.2-10 5 


7.0-10" 7 




40 

SUM* 


9.9*10 7 


2*10" 4 


1.4-10 7 


3*10" 5 



* SUM is the total pfu (or fraction of input) obtained from 
4 Sail pH elution fractions 
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TABLE 214 

Abbreviated fractionation of display phage on HNE beads 

5 " 







DISPLAY 


PHAGE 






EpiNE-7 


MA-ITI 2 


MA-ITI-E7 1 


MA— IT I— E 7 2 


) 

INPUT 
(pfu) 


1.00 
(1.8-10 9 ) 


1.00 
(1.2-10 10 ) 


1.00 
(3.3»10 9 ) 


1.00 


SWASH 


6«10" 5 


I'lO" 5 


2^10" 5 


2«10~ 5 


pH 7.0 


3*10~ 4 


1«10~ 5 


2-10" 5 


5 

4*10" 


pH 3.5 


3-10" 3 


3-10" 6 


8*10~ 5 


8«10~5 


pH 2.0 


1«10~ 3 


1*10~ 6 


6*10" 6 


2-10" 5 


SUM* 


4.3*10~ 3 


1.4*10~ 5 


4 

1.1*10" 


1.4»10~ 4 



* SUM is the total fraction of input pfu obtained from all 
pH elution fractions 
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TABLE 215 

Fractionation of EpiNE-7 and MA-ITI-E7 phage on HNE beads 
5 





EoiNE-7 


J*1A~"1± 1" 




Sample 

10 


Total P^u 
in sample 


r rac non 
of input 


Total pfu 
in sample 


Fraction 
of input 


INPUT 


1 . 8 • 10^ 


X . 00 


o . 0 • 10 


1 • 00 


15 

PH 7.0 


5.2-10 5 


2.9*10" 4 


6.4 «10 4 


2.1-10" 5 


pH 6.0 


6.4»10 5 


3.6-10" 4 


4.5*10 4 


1.5*10~ 5 


20pH 5.5 


7.8-10 5 


4.3*10~ 4 


5.0-10 4 


1.7*10~ 5 


pH 5.0 


8.4 *10 5 


4.7 *10" 4 


5.2*10 4 


1.7*10" 5 


pH 4.5 

25 

pH 4.0 


1.1-10 6 

1.7*10 6 


6.1-10" 4 
9.4*10" 4 


4.4*10 4 
2.6*10 4 


1.5*10" 5 
8.7*10" 6 


pH 3.5 


1.1-10 6 


6.1«10~ 4 


. 1.3-10 4 


4.3*10~ 6 


30pH 3.0 


3.8»10 5 


2.1»10~ 4 


5.6-10 3 


1.9*10" 6 


pH 2.5 


2.8*10 5 


1.6*10~ 4 


4.9-10 3 


1.6*10~ 6 


pH 2.0 

35 


2.9-10 5 


1.6*10~ 4 


2.2*10 3 


7.3*10~ 7 


SUM* 


7.6*10 6 


4.1-10" 3 


3.1-10 5 


1.1-10" 4 



40* SUM is the total pfu (or fraction of input) obtained from 
all pH elution fractions 
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